This post is about the original OMAP3530 BeagleBoard from 2008. Yes, the one so old that it doesn’t even show up in the board list on BeagleBoard.org anymore. The BeagleBoard, not the BeagleBone. During my Chumby 8 kernel escapades, at one point I ran into a UART bug that affected multiple drivers, including the omap-serial driver. This led me to buy a BeagleBoard so I could verify the omap-serial bug on hardware.

After I figured out the bug with the UART driver, I realized that the OMAP3530 has support for booting from USB, so I decided to go off on a random tangent to get USB boot working. There was no problem I was trying to solve or anything like that. I just thought it would be a fun experiment (am I a masochist?). Little did I know, I would be getting myself into some tricky USB packet analysis.

I struggled to find info about this process because of how old the OMAP is today. The main utility I found was a program called omap_loader by Grant Hernandez, which is a newer rewrite of Martin Mueller’s original omap3_usbload circa 2008. Thanks to some lucky searching combined with the Internet Archive, I connected the dots between 2008 and the present. At some point before 2013, Rick Bronson provided an update to omap3_usbload (along with a patch to TI’s X-Loader bootloader) that enabled uploading additional files like a full U-Boot and Linux kernel into RAM after X-Loader, all through USB. This unlocked the ability to boot all the way to Linux from a completely blank BeagleBoard. Grant’s newer omap_loader utility also incorporates these same improvements.

All of this research was difficult. Many of the links I found pointed to sites like gitorious.org and arago-project.org, both of which no longer exist (although Arago’s Git repos are now hosted by TI). eLinux.org’s BeagleBoard wiki was totally rearranged at some point and lost its info about USB recovery, and Rick’s site no longer exists, but as usual, the Internet Archive saved the day.

At some point later on, X-Loader was replaced by U-Boot SPL, so I think that is partially why so much of this info eventually disappeared from the web. But it’s a darn shame. This USB booting functionality is really cool, and it seems like most of the documentation for it has slowly gone by the wayside! The main breadcrumbs remaining on modern Google are the newer omap_loader utility, and also some references to Nest thermostats. For example, Nest’s X-Loader had the USB patch applied (with some tweaks added).

With all that research out of the way, I was ready to try it all out. I compiled omap_loader, grabbed the pre-built binary of x-load.bin that was included with Rick’s patchset, and also used a u-boot.bin that I had compiled myself using Buildroot while performing my UART tests with a modern kernel on the BeagleBoard. Then, I tried to load it:

$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...

The idea behind this command is it sends X-Loader (x-load.bin) as the main payload that the OMAP’s on-chip bootloader is listening for over USB. Then, X-Loader starts up. Next, omap_loader sends any additional files using X-Loader’s USB protocol. In this case, I’ve supplied one extra file: u-boot.bin, which I told it to load into RAM at 0x80800000. Finally, the -j 0x80800000 argument tells X-Loader to jump into U-Boot rather than hanging around doing nothing afterward.

The output of the command looked normal so far. I plugged in my BeagleBoard, which didn’t have an SD card inserted and also had its NAND flash erased, so it had no bootloader installed and thus it would attempt a USB boot.

[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[-] fatal transfer error (BULK_OUT) for 26956 bytes (0 made it): LIBUSB_ERROR_PIPE
[-] failed to send file 'x-load.bin' (size 26956)
[-] failed to transfer the first stage file 'x-load.bin'

Darn. The utility recognized the BeagleBoard being plugged in, but libusb errored out with a pipe error. Long story short, I messed around with a few other computers, and I found that a few of my older computers, old enough that they didn’t have USB 3.0 ports on their motherboards, actually worked perfectly fine with omap_loader. I couldn’t get it to work properly with most of my modern machines though, AMD or Intel.

I thought this would be a great application for a USB sniffer, so I decided to record some traces of success versus failure.

Here’s a link to my in-depth investigation comparing success versus failure on the GitHub issue about this problem. Yep, it turns out I wasn’t the only one running into this exact same issue. Grant himself was seeing similar problems, and had come to a similar conclusion that it seemed to be machine-dependent. Other people had mentioned that adding delays at certain points in the code seemed to help. I was intrigued, so I tried to get to the bottom of it.

Here’s what the USB boot process is supposed to look like, according to TI’s OMAP35x Technical Reference Manual:

  • The OMAP device enumerates as a USB device.
  • Within 300 ms, the host needs to read an “ASIC ID” structure from the OMAP or else it will disconnect from USB.
  • Then, the host sends a 4-byte command: 0xF0030002 means to continue booting through USB.
  • Next, the host sends the 4-byte length of bootloader data it wants to transfer.
  • Finally, the host sends the bootloader (X-Loader in this case), which will be loaded into internal SRAM starting at 0x40200000.
  • After the OMAP device receives all of the data, it runs the received bootloader by jumping to 0x40200000.

Again, this process worked perfectly fine on my older computers that don’t support USB 3.0, but on my newer computers with USB 3.0, it was hanging up. I did notice that the newer computers were trying to fit a lot more data into a single USB frame. For example, the start of my older computer’s communication with the OMAP looked like this:

  • Frame 1
    • Host sends boot command
    • OMAP confirms it
  • Frame 2
    • Host sends length
    • OMAP confirms it
  • Frame 3
    • Host sends first packet of bootloader data
    • OMAP confirms it
    • Host sends second packet of bootloader data
    • OMAP says it’s not ready
    • Host pings
    • OMAP says it’s ready now
    • Host sends second packet of bootloader data again
    • OMAP confirms it

And then from that point on, it was just a process of sending the rest of the data like that. About 5 data packets would fit into each frame. My newer computer’s traffic looked like this instead:

  • Frame 1
    • Host sends boot command
    • OMAP confirms it
    • Host sends length
    • OMAP says it’s not ready
    • Host pings a few times until the OMAP is ready
    • Host sends length again
    • OMAP confirms it
    • Host sends first packet of bootloader data
    • OMAP confirms it
    • Host sends second packet of bootloader data
    • OMAP says it’s not ready
    • Host pings several times, OMAP never says it’s ready during the rest of this frame
  • Frame 2
    • Host pings
    • OMAP responds with a STALL packet

The newer xHCI host controller was trying its best to efficiently squeeze a lot of packets into the first frame. Even though this is a pattern that should be perfectly valid to follow when communicating with a USB device, the OMAP bootloader was clearly not happy about something, and eventually sent a STALL packet before omap_loader made much progress. Various USB packet traces on different modern computers revealed similar issues. It would either STALL after the second packet, or just NAK forever and never accept additional incoming data.

Inspired by other comments about adding delays, I tried to work around this by inserting an artificial 1 ms delay before every libusb_bulk_transfer() call. This would force modern machines to slow down a little bit. As soon as I added those delays, all of my new computers had no trouble uploading X-Loader to the OMAP. So yeah, I think the OMAP just doesn’t like receiving USB data too quickly.

That wasn’t the end of this little project, though. The 1 ms delay fixed the issue with getting X-Loader to run, but the newer computers also ran into problems while trying to upload U-Boot through X-Loader!

[-] device timed out while transfering in 512 bytes (got 0)
[-] device timed out while transfering in 512 bytes (got 0)
[-] device timed out while transfering in 512 bytes (got 0)
[-] failed to read command from X-Loader
[-] failed to transfer the additional files in to memory

Rats. I went back to the USB sniffer for more research.

This time, it was a different problem. I found the point where the host would try to read the initial request from X-Loader: USBf. On my older computer, this worked fine; it received a 13-byte string from X-Loader: USBffile req followed by a null terminator. It was happy with this, and omap_usbload kept going on with the rest of the file load process and everything succeeded.

On the newer computer, some shenanigans were going on. Let’s look at the USB trace in depth:

The 335-byte packet contains 332 actual bytes of data (the packet ID and CRC account for the other 3 bytes), and is the final chunk of X-Loader. It was successfully received and confirmed by the OMAP with an ACK. At that point, we can assume that the OMAP has begun jumping into X-Loader to start it up.

A millisecond later (due to the delay I added), we start trying to read from X-Loader. It’s clearly too soon, though; I don’t think X-Loader has finished starting up yet. There’s nothing ready to read. So these IN/NAK packets continue on for about 5 more milliseconds, which is totally normal. But then, something finally happens: the OMAP stops responding to our IN packets. My computer’s USB host controller tries three times (see the three IN packets in a row below?) and then it gives up. I’m guessing this is around the same time that X-Loader is doing its own hardware initialization, so maybe the OMAP’s USB controller is temporarily disabled.

This all makes sense so far. We tried to read too quickly before X-Loader finished starting up, so when it did finally load, there was a brief moment where it would not respond to IN packets. The host controller didn’t like this and stopped trying, so all we saw from that point on was SOF packets because we weren’t attempting any more USB reads. Some of my other computers gave up after 15 unanswered IN packets instead of 3. I’m not sure if that’s a difference in the host controller or what, but it’s the same root problem.

You may be wondering: why didn’t the older computers run into this same issue? They were also trying to talk with X-Loader too early, so why wouldn’t they run into this same roadblock? The answer is that their older host controllers are more tolerant of the missing NAKs. I recorded a similar trace with one of my older computers that works fine without any patches to omap_loader. It also immediately began sending IN packets trying to read from X-Loader way too soon. Just like the problematic computers, it experienced a brief period where the OMAP stopped responding to INs with NAKs. The difference is that it didn’t abandon hope so quickly. There were 33 unanswered IN packets. After that, the OMAP continued responding with NAKs again and everything was fine from that point on. 17 ms after we had originally finished sending out X-Loader, the OMAP finally responded with an actual data packet. So that’s the total time it took for X-Loader to launch.

Back to the newer computers that weren’t playing nicely. I was still confused. Even though this is a problem with newer host controllers, omap_loader has a retry mechanism! If it fails to read X-Loader’s initial data, it will try again 2 seconds later. You’d think it would succeed at that point. Let’s see what happens:

Ah, interesting. The retry is actually 3 seconds later. I’m guessing the first failed read attempt had a 1-second timeout, so then with a 2-second retry timer, that adds up to 3 seconds total.

Anyway, something funky happens here. The USB host finally reads the 13-byte string from X-Loader as a DATA1 packet (again, it shows up as 16 bytes because of the packet ID and CRC). The host then acknowledges reception with an ACK, but for some bizarre reason, it immediately continues attempting to read more data! I won’t show the whole trace, but the host keeps polling with IN packets for a whole second. And of course, they’re all NAKed. X-Loader knows it successfully sent data to us, so it has shifted over to waiting for the host to send an OUT packet instead. It’s like the host controller gets confused and expects X-Loader to send more data. The kernel never reports those 13 bytes back to libusb, even after the 1-second transfer timeout expires.

I don’t consider myself to be a USB expert, so maybe I’m misunderstanding something. This behavior just seems wrong, though. When my computer finally reads 13 bytes (proven by the sniffer trace shown above), why isn’t this data reported back to libusb? I would have expected the reception of a short DATA0/1 packet to cause the host controller to stop reading and return the data back immediately. Is this some kind of strange bug in the Linux kernel or the host controller hardware or something? I don’t know for sure. I find this behavior to be very odd, and I can’t explain it. My off-the-cuff guess is that the initial failure to respond to the three IN packets results in something getting out of sync in the host controller, but I really don’t know for sure. In my opinion, the retry should have worked, but clearly, something got confused. Not sure what. I don’t think it’s libusb’s fault, though.

I hate adding arbitrary delays in order to fix things, but a 20-millisecond delay between uploading X-Loader and attempting to read from it fixes this final issue. It ensures that the OMAP has been given ample time to launch X-Loader before we try reading from it, preventing the host controller from encountering the weird situation with unanswered IN packets.

After all of this tinkering and patching that I did to get things to play nicely on newer machines, here is a successful run of omap_loader:

[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[+] uploading 'u-boot.bin' (size 777760) to 0x80800000
[+] jumping to address 0x80800000
[+] successfully transfered 2 files

Meanwhile, the following output pops up on the BeagleBoard’s UART:

Texas Instruments X-Loader 1.5.1 (Nov 15 2011 - 09:36:31)
Beagle Rev C4
Trying load from USB
USBLOAD_CMD_FILE total = 12 addr = 0x73425355 val = 0xbde20 val = 0x80800000
got file addr = 0x808bde20
USBLOAD_CMD_JUMP total = 8 addr = 0x6a425355 val = 0x80800000


U-Boot 2023.10 (May 25 2024 - 22:05:27 -0700)

OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 MHz
Model: TI OMAP3 BeagleBoard
OMAP3 Beagle board + LPDDR/NAND
I2C: ready
DRAM: 256 MiB
Core: 44 devices, 18 uclasses, devicetree: separate
NAND: 256 MiB
MMC: OMAP SD/MMC: 0
Loading Environment from NAND... *** Warning - bad CRC, using default environment

Beagle Rev C4
Timed out in wait_for_event: status=0000
Check if pads/pull-ups of bus are properly configured
No EEPROM on expansion board
OMAP die ID: 79b8000400000000040398da1401c009
Net: No ethernet found.
Hit any key to stop autoboot: 2

I believe the “Timed out in wait_for_event” error is harmless. Anyway, success! It loads U-Boot! You can imagine that I could have easily transmitted a Linux kernel and initramfs as well, and fully booted this thing over USB. Once U-Boot is running, I can do whatever I want.

With these simple delay tweaks, omap_loader works great on all modern computers I’ve thrown at it, including Raspberry Pis. The only “gotcha” I’ve encountered is that some slower computers (my i3-7100U laptop and a Raspberry Pi Zero) don’t forward the USB hotplug event through udev quickly enough before the BeagleBoard decides it’s not being asked to boot over USB. omap_loader never gets past scanning for a device, even though the dmesg log clearly shows that it was detected:

[4076310.258842] usb 11-5: new high-speed USB device number 65 using xhci_hcd
[4076310.407944] usb 11-5: unable to get BOS descriptor or descriptor too short
[4076310.410041] usb 11-5: New USB device found, idVendor=0451, idProduct=d009, bcdDevice= 0.00
[4076310.410046] usb 11-5: New USB device strings: Mfr=33, Product=37, SerialNumber=0
[4076310.410051] usb 11-5: Product: OMAP3430
[4076310.410054] usb 11-5: Manufacturer: Texas Instruments
[4076310.710703] usb 11-5: USB disconnect, device number 65

As you can see, it’s a very short timeframe; just like TI’s manual says, it only stays connected for about 300 ms if it doesn’t hear from the host. I guess that’s not enough time for udev on some computers. The only solution I found for this issue on my slower machines was to compile a custom version of libusb with udev disabled, which forces it to directly use netlink for hotplug detection instead.

My patch also limits libusb transfers to 512 bytes at a time. I don’t think this change is critical, though. It fixed an issue I ran into where my bus was really loaded and libusb reported a memory error. I don’t think it actually helps anything in most cases as long as people aren’t performing crazy big USB transfers at the same time.

In summary:

  • Trying to write USB data to the OMAP’s on-chip bootloader too quickly seems to hit some edge cases that it doesn’t handle correctly. A 1 ms delay fixes this.
  • Trying to read from X-Loader before it’s ready to go irritates newer USB host controllers when they send out several IN packets without receiving any response (not even a NAK). A 20 ms delay fixes this.
    • Even retries afterward fail; the host controller gets out of sync due to the unanswered IN packets or something like that.
  • On some slower computers, udev doesn’t give you enough time to respond to the OMAP’s 300 ms timeout, so libusb never detects the hotplug. This can be solved with a custom libusb that uses netlink instead of udev.

I opened up a PR to submit these fixes (except for the udev thing) upstream to omap_loader in 2024. Why am I writing about this now? Well, remember when I mentioned Nest earlier? Google ended support for older Nest thermostats last month, which renewed some interest in merging my reliability improvements so that people can flash custom firmware to their Nest thermostats. Those old Nest devices also use OMAP processors.

What it boils down to is: all this tinkering I did last year with pointlessly booting old BeagleBoards over USB accidentally ended up being useful. It helped out some Nest thermostat revival projects that have been popping up in the last month. So I thought now might be a fun time to talk about my tiny involvement with that. Yay! It’s always fun when a random side project unexpectedly helps other people.

Trackback

no comments

Add your comment now