This is a continuation of my previous post about upgrading the old 2.6.28 Linux kernel that came with my Chumby 8. In that post, I got a modern U-Boot working with SD card support, which is what I needed in order to boot Linux.

After I finished getting U-Boot working, I began work on the kernel. I based my work on the stable kernel version 5.15.33. I started by compiling a kernel using the bundled pxa168_defconfig file. I created a device tree file called pxa168-chumby8.dts based on pxa168-aspenite.dts. It needed a few tweaks. I specified the correct amount of RAM for the Chumby 8 (128 MB) and changed the model and compatible strings. I also disabled “twsi1” which is an I2C host. I wasn’t ready to deal with I2C yet. Here’s a small snippet of the relevant changed parts of my new device tree file.

model = "Chumby Industries Chumby 8";
compatible = "chumby,chumby8", "mrvl,pxa168";
memory {
	reg = <0x00000000 0x08000000>;
};

I integrated this all into buildroot to put the generated kernel (uImage) and pxa168-chumby.dtb device tree blob onto the SD card so I could load it in U-Boot. Then, I played around with attempting to boot it from U-Boot:

ext4load mmc 0:2 0x1000000 /boot/uImage
ext4load mmc 0:2 0x2000000 /boot/pxa168-chumby8.dtb
setenv bootargs 'console=ttyS0,115200 root=/dev/mmcblk0p2 rootwait ro ignore_loglevel'
bootm 0x1000000 - 0x2000000

As a brief summary, the list of commands above loads uImage from the SD card (partition 2) into RAM at 0x1000000 as well as the device tree blob at 0x2000000. It sets a few kernel command line arguments and then attempts to boot the kernel. I didn’t expect the SD card to work yet because I hadn’t added it into the device tree, but I set it up in the command line so it would be ready to go when I did get around to adding it to the device tree.

At this point, what I wanted to see was a kernel boot ending with “No init found” or a missing root device or something along those lines. Instead, I just got the normal U-Boot kernel loading output ending with “Starting kernel …” and then nothing!

2598896 bytes read in 474 ms (5.2 MiB/s)
3228 bytes read in 21 ms (149.4 KiB/s)
## Booting kernel from Legacy Image at 01000000 ...
   Image Name:   Linux-5.15.33
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    2598832 Bytes = 2.5 MiB
   Load Address: 00008000
   Entry Point:  00008000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 02000000
   Booting using the fdt blob at 0x2000000
   Loading Kernel Image
   Loading Device Tree to 07998000, end 0799bc9b ... OK

Starting kernel ...

Well that’s not good! Luckily I have a lot of experience with board bringup and running into this exact problem, so I didn’t panic. This could be caused by a lot of things. Maybe I don’t have the UART mapped properly. Or maybe something’s crashing before the UART driver loads, in which case I could turn on the earlyprintk functionality to make sure I’m seeing everything. I went into the menuconfig to check on earlyprintk and realized that the stock pxa168_defconfig file was all screwed up. Doing a quick search in the menuconfig for PXA168 led me to realize that CPU_PXA168 was not set. Furthermore, the dependencies in menuconfig told me that I needed an ARCH_MULTI_V5 kernel, which makes sense because the PXA168 is an ARMv5 processor. But it was enabling ARCH_MULTI_V7 by default! That’s no good. So I disabled ARCH_MULTI_V7, and ARCH_MULTI_V5 suddenly showed up as active. This allowed me to navigate to the CONFIG_ARCH_MMP category where I enabled CONFIG_MACH_MMP_DT since I was trying to use it with a device tree.

After all that, I realized that CPU_PXA168 still wasn’t set, but that’s okay. It seems to be a legacy internal config that is only used with the legacy board files such as the Aspenite and GuruPlug boards. I’m glad it existed, because it helped me realize my kernel wasn’t configured properly for the PXA168 despite using pxa168_defconfig! Anyway, I finished up by enabling earlyprintk, saving my kernel changes, and rebuilding. Luckily, the kernel config defaulted to the correct base address for the debug UART so I didn’t have to change anything in order to enable earlyprintk (other than adding earlyprintk to my kernel command line).

I booted again, and it finally worked! I won’t bore you to death with a huge log of Linux console output showing the entire boot process, but it ended like this:

[    0.326511] d4017000.serial: ttyS0 at MMIO 0xd4017000 (irq = 43, base_baud = 921600) is a UART1
[    0.338167] printk: console [ttyS0] enabled
[    0.338167] printk: console [ttyS0] enabled
[    0.346571] printk: bootconsole [earlycon0] disabled
[    0.346571] printk: bootconsole [earlycon0] disabled
[    0.356561] NET: Registered PF_PACKET protocol family
[    0.358293] Key type dns_resolver registered
[    0.367799] XScale iWMMXt coprocessor detected.
[    0.372867] Waiting for root device /dev/mmcblk0p2...

Boom! I had the kernel at least starting to boot. It was just waiting for the SD card to show up because I added the “rootwait” command line parameter. This would never succeed because I didn’t have it in the device tree yet. Removing rootwait from my command line resulted in the type of error I expected to see all along:

[    0.372824] /dev/root: Can't open blockdev
[    0.372824] VFS: Cannot open root device "mmcblk0p2" or unknown-block(0,0): error -6
[    0.372824] Please append a correct "root=" boot option; here are the available partitions:
[    0.372824] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Perfect! Next it was time to get the SD controller working, so I could boot into my minimal buildroot environment. Little did I know, this was going to lead me into a fun little rabbit hole.

I knew based on my old patches from 2013 that the SDHC controller driver needed some changes for PXA168 support. By the way, there had been some attempts in 2011 through 2013 to get the PXA168 SDHC driver working in the mainline kernel, but the patches didn’t pass code review and were never re-submitted. Rather than just blindly applying the fixes I had used in 2013, I decided to see what would happen if I simply enabled the existing sdhci-pxav2 driver. After all, that approach mostly worked when I enabled it with U-Boot! So I started off by enabling the correct SDHC controller in my dts file. This required looking up I/O addresses, interrupt numbers, required clock names in the driver, etc.:

soc {
	axi@d4200000 {
		sdhci3: sdhci@d427e000 {
			compatible = "mrvl,pxav2-mmc";
			reg = <0xd427e000 0x1000>;
			interrupts = <40>;
			clocks = <&soc_clocks PXA168_CLK_SDH2>;
			resets = <&soc_clocks PXA168_CLK_SDH2>;
			clock-names = "PXA-SDHCLK";
			mrvl,clk-delay-cycles = <0x1F>;
			non-removable;
			no-1-8-v;
			bus-width = <4>;
			max-frequency = <24000000>;
			cap-sd-highspeed;
			status = "okay";
		};
	};
};

Ideally this would go into the mainline kernel’s pxa168.dtsi file with a status of disabled (along with the other three SDHC controllers), but I just wanted to get it working. I enabled CONFIG_MMC_SDHCI_PXAV2, which required enabling CONFIG_MMC, CONFIG_MMC_SDHCI, and CONFIG_MMC_SDHCI_PLTFM. Then I recompiled and tested the kernel once more. Success at this point would look like a buildroot login prompt:

Welcome to Buildroot
buildroot login: 

Oh, if only I could have been that lucky. Instead, I got nothing. The boot process just froze after:

[    0.390237] XScale iWMMXt coprocessor detected.

WTF? At this point there were a lot of different things I could try. I settled on littering printk statements throughout the sdhci-pxav2 driver to figure out what was going on. The hang had to be caused by the driver I had just enabled, right? One cool thing I learned while debugging this issue is you can add #define DEBUG to the top of files in the kernel and they will print out useful debugging info. Adding it to the top of sdhci.c gave me some useful output:

[    0.380493] mmc0: sdhci: Version:   0x00000001 | Present:  0x01fa0000
[    0.387387] mmc0: sdhci: Caps:      0x05f832b2 | Caps_1:   0x00000000
[    0.393850] mmc0: sdhci: Disabling ADMA as it is marked broken
[    0.399762] Key type dns_resolver registered
[    0.399903] XScale iWMMXt coprocessor detected.
[    0.408849] mmc0: sdhci: Auto-CMD23 unavailable
[    0.408861] mmc0 bounce up to 128 segments into one, max segment size 65536 bytes

Further printk debugging proved to me that the hang was occurring at a call to mmc_delay() in drivers/mmc/core/core.c. This baffled me. Why would a delay function hang forever? mmc_delay() is a simple inline function that calls either usleep_range() or msleep(). These are pretty important functions in the kernel that shouldn’t hang forever. My mind immediately jumped to some kind of a timer problem, but I proved to myself through debug statements that the timer was indeed running and the jiffies count was going up. This problem was super frustrating to me. I knew this had nothing to do with any of the PXA168 SDHC patches I had skipped because none of them would cause a core Linux sleep function to hang.

The thing that scared me was I had no idea what level this bug might be at. Was the kernel just completely broken on PXA168? Was the SD controller driver corrupting something? At this point I had to let it go and sleep on it. I had made so much progress, but didn’t have much to show for it yet!

The next day, I came back with some fresh ideas. Logical ideas. First of all, I decided to try to eliminate the SD controller driver altogether by booting into a CPIO rootfs supplied by U-Boot instead, e.g.:

ext4load mmc 0:2 0x3000000 /rootfs.cpio.uboot
setenv bootargs 'console=ttyS0,115200 root=/dev/ram0 ignore_loglevel'
bootm 0x1000000 0x3000000 0x2000000

What I found was the kernel was still eventually hanging on delay/wait functions, just somewhere else instead. It was hanging up in rcu_barrier() inside of mark_readonly() in init/main.c. I noticed that this could be skipped by disabling CONFIG_STRICT_KERNEL_RWX, so I tried that — which led to further boot progress, but it still resulted in a hang in getty. getty was doing something that eventually led to a call to hrtimer_nanosleep and hanging. It was like the scheduler wasn’t working properly. This was good news — the problem was unrelated to the SD controller.

The other good news was that I had definitely seen it work in Linux 3.13 in the past. So I decided to try the only logical solution I could think of: a git bisect. I did a little bit of manual bisecting work to find which kernel version introduced the problem. This was a bit tricky because older kernels didn’t like to build with my newer toolchain (this patch was useful to apply to those old kernels). What I found was that the problem was introduced sometime between Linux 4.1 and 4.2. I performed a git bisect between v4.1 and v4.2, which eventually pointed me to this commit as the culprit:

clk: mmp: add timer clock for pxa168/mmp2/pxa910

Prior to this commit, the timer clock wasn’t being controlled by the kernel at all, so it was staying on because U-Boot turned it on. But now, the kernel had the ability to turn it off and on. I would be lying if I said it clicked in my head immediately, but I was eventually able to deduce that the timer clock was being turned off by Linux during the boot process because it was unused in the device tree. This made perfect sense based on the symptoms I was seeing. I had proven earlier that the timer was running, but I didn’t realize it was being intentionally turned off later in the boot process.

I later learned that I could have added the clk_ignore_unused parameter to the kernel command line to inhibit this behavior for testing, but the real fix was to make sure the timer clock was actually hooked up in pxa168.dtsi. All of the other peripherals had their clocks hooked up; for some reason, the timer’s clock was missing. By the way, I discovered that this exact same issue had already been fixed for the mmp2 in 2018.

This fix took a while to get accepted in the mainline kernel because the PXA168 isn’t maintained anymore. I originally submitted it in June, but didn’t receive any responses. I eventually asked in the #armlinux IRC channel and a friendly maintainer there helped guide me in the right direction. The fix was quickly picked up at that point and is slated to be included in Linux 6.2. I probably should have put a “Fixes:” tag on this commit to help get it backported to stable kernels, but it likely doesn’t matter anyway — it’s clear that people haven’t been using the PXA168 with modern kernels.

With that big distraction out of the way, I was finally able to return my focus to the sdhci-pxav2 driver. Could I get the SD card working properly? With the timer fixed, I attempted to boot again:

[    0.415018] mmc0: SDHCI controller on d427e000.sdhci [d427e000.sdhci] using DMA
[    0.422602] Waiting for root device /dev/mmcblk0p2...
[    6.457463] random: fast init done
[   10.777463] mmc0: Timeout waiting for hardware cmd interrupt.
[   10.777474] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   10.783237] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00000001
[   10.789677] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
[   10.796126] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
[   10.802576] mmc0: sdhci: Present:   0x01fa0000 | Host ctl: 0x00000001
[   10.809026] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[   10.815476] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000000
[   10.821926] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00000000
[   10.828375] mmc0: sdhci: Int enab:  0x00ff0003 | Sig enab: 0x00ff0003
[   10.834825] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[   10.841274] mmc0: sdhci: Caps:      0x05f832b2 | Caps_1:   0x00000000
[   10.847716] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00000000
[   10.854164] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
[   10.860605] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[   10.867047] mmc0: sdhci: Host ctl2: 0x00000000
[   10.873496] mmc0: sdhci: ============================================

Progress! I thought it was going to hang, but it started spitting out messages like this every 10 seconds or so. I did notice that the most recent attempt at mainlining PXA168 support in this driver in 2013 enabled a couple of quirks for the PXA168: SDHCI_QUIRK_NO_BUSY_IRQ and SDHCI_QUIRK_32BIT_DMA_SIZE. Marvell’s driver from kernel 2.6.28 also enabled these quirks. The IRQ quirk in particular seemed like it could potentially be related to this problem, but enabling it didn’t fix it, so I left the quirks off.

Because I had run into clocking issues with the timer, I decided to take a closer look at the SDHC clock code in drivers/clk/mmp/clk-of-pxa168.c to make sure there weren’t any clocking issues. And…sure enough, the clocks were only set up in this file for the first two SDHC peripherals. The Chumby 8 uses the third peripheral for the SD card. So I added in the clock for the third peripheral and tried again.

[    0.415095] mmc0: SDHCI controller on d427e000.sdhci [d427e000.sdhci] using DMA
[    0.422675] Waiting for root device /dev/mmcblk0p2...
[    0.472226] mmc0: new SD card at address f259
[    0.477515] mmcblk0: mmc0:f259 SU02G 1.84 GiB 
[    0.486134]  mmcblk0: p1 p2
[    0.523469] List of all partitions:
[    0.523615] b300         1931264 mmcblk0 
[    0.527131]  driver: mmcblk
[    0.534031]   b301            9216 mmcblk0p1 00000000-01
[    0.534044] 
[    0.539405]   b302          204800 mmcblk0p2 00000000-02
[    0.540911] 
[    0.547781] No filesystem could mount root, tried: 
[    0.547794]  cramfs
[    0.559606] 
[    0.559613] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(179,2)

I’ve never been so happy to see a kernel panic in my life! It was detecting the SD card! I needed to enable the ext4 filesystem in my kernel config, because it wasn’t showing up in the “tried:” list above. As soon as I added that, I was mostly in business. It was booting, but really slowly. I was getting mmc0 timeout errors that caused long pauses. The errors looked just like the timeout error above with the register dump, but rather than “Timeout waiting for hardware cmd interrupt”, they said “Timeout waiting for hardware interrupt”. It turns out that SDHCI_QUIRK_NO_BUSY_IRQ is actually needed to fix a hardware interrupt timeout issue with the PXA168 — but clocking issues can also cause a similar error! Enabling that quirk fixed the problem. I also enabled the DMA quirk since Marvell’s kernel used it, although I didn’t observe any problems without it.

After all of the fun clocking issues I dealt with, I decided that I needed to closely go over the code in clk-of-pxa168.c to see if I could find any other problems. I really didn’t want to be surprised by any more clock-related problems. Armed with the PXA16x Software Manual, I looked at register bits and parent clocks. I found several mistakes with incorrect parent clocks and incorrect register bits being controlled. I also added clocks for some missing peripherals. Additionally, I found that sdhci1/2 and sdhci3/4 share a common clock enable in the sdhci1 and sdhci3 clock registers, which required some special handling. These clocking fixes were submitted upstream in June, accepted in September, and landed in Linux 6.1.

The actual change to add PXA168 support to the sdhci-pxav2 driver is still a work in progress. I’ve submitted it upstream and have been asked to make a few changes (for the better!). I’m going through the code review process and hopefully will have that all figured out soon. A few of the patches for this driver actually deal with other issues I ran into while getting Wi-Fi working. I will cover all of the fun with Wi-Fi in the next post in this series.

Click here to go to part 3, where I get Wi-Fi working.

Trackback

3 comments

  1. […] Doug Brown ☛ Upgrading my Chumby 8 kernel part 2: Initial Linux boot […]

  2. Ray Knight @ 2023-01-02 14:42

    Perhaps I missed it, but do you have a link to a repository with all of your Chumby related patches to uboot and the linux kernel?

  3. Hi Ray,

    I’ve put so many links in these posts that it’s hard to keep track of them all. I’m glad you asked, so I can lay it out right here:

    The U-Boot repository (branch = chumby8) is here:
    https://github.com/dougg3/u-boot

    The Linux repository (branch = v6.0.9-chumby8) is here:
    https://github.com/dougg3/silvermoon-linux
    However, I periodically rebase onto new kernels and make new branches, so eventually this branch will be out of date.

    The buildroot repository, which contains the board’s devicetree blob, (branch = chumby) is here:
    https://github.com/dougg3/buildroot

Add your comment now