I was looking at one of my classic Macs a few weeks ago, and noticed that my Ubuntu 18.04 netatalk server wasn’t showing up in the Chooser anymore. If you’re not familiar with netatalk, it’s an implementation of Apple Filing Protocol (AFP) that runs on Unix-like operating systems such as Linux and NetBSD. It allows other operating systems to act as Mac file servers. Version 2.x, which I use, supports the ancient AppleTalk protocol. This allows it to work with really old classic Macs that don’t even have a TCP/IP stack installed. Support for AppleTalk was removed in version 3.x, so that’s why I’m still using 2.x.

I checked out the server, and noticed that atalkd wasn’t running.

doug@miniserver:~$ ps ax | grep atalkd
3351 pts/0 R+ 0:00 grep --color=auto atalkd

Hmmm….why wouldn’t atalkd be running? I went ahead and tried to restart netatalk:

doug@miniserver:~$ sudo service netatalk restart
Job for netatalk.service failed because the control process exited with error code.
See "systemctl status netatalk.service" and "journalctl -xe" for details.

Uh oh! Why won’t netatalk start?

doug@miniserver:~$ systemctl status netatalk.service
● netatalk.service
Loaded: loaded (/etc/init.d/netatalk; generated)
Active: failed (Result: exit-code) since Sat 2020-08-01 10:29:36 PDT; 54s ago
Docs: man:systemd-sysv-generator(8)
Process: 1320 ExecStart=/etc/init.d/netatalk start (code=exited, status=1/FAILURE)

Aug 01 10:29:36 miniserver systemd[1]: Starting netatalk.service…
Aug 01 10:29:36 miniserver netatalk[1320]: Starting Netatalk services (this will take a while): socket: Address family not supported by protocol
Aug 01 10:29:36 miniserver netatalk[1320]: socket: Address family not supported by protocol
Aug 01 10:29:36 miniserver netatalk[1320]: atalkd: can't get interfaces, exiting.
Aug 01 10:29:36 miniserver systemd[1]: netatalk.service: Control process exited, code=exited status=1
Aug 01 10:29:36 miniserver systemd[1]: netatalk.service: Failed with result 'exit-code'.
Aug 01 10:29:36 miniserver systemd[1]: Failed to start netatalk.service.

This setup had been working forever. What could have possibly changed? I was keeping up to date with all of my Ubuntu updates. I had already needed to manually patch the netatalk binary due to another bug. Maybe I needed to reapply the patch? But no, the netatalk binary hadn’t been updated. That wasn’t it.

I tried some Googling and noticed that recently AppleTalk had been patched so that you can’t create raw sockets without the CAP_NET_RAW capability, so I fiddled with setcap to set that capability on the atalkd binary, but that didn’t seem to fix anything, so I undid all the capability changes I tested.

After further experimentation, I realized that the appletalk kernel module wasn’t being loaded:

doug@miniserver:~$ lsmod | grep appletalk
doug@miniserver:~$

Naturally, I tried to load it myself:

doug@miniserver:~$ sudo modprobe appletalk
modprobe: ERROR: could not insert 'appletalk': Cannot allocate memory

Aha! There’s the real problem. Why can’t it allocate memory? I wondered if it was something specific to this particular machine. To test my theory, I headed over to my desktop Linux machine and ran the same modprobe command. It failed with the exact same error.

At this point after trying to do more research, I gave up for a while because I had more important stuff to worry about. It’s kind of difficult to search for info about this type of problem, because hardly anybody is using the AppleTalk networking layer in Linux anymore. “There are dozens of us!”

I finally came back and did some more troubleshooting. Since I knew it had worked before, I tried installing various kernel versions in a VM. Sure enough, Ubuntu’s 5.0.0 kernel worked fine. So this was definitely a kernel issue if I wasn’t already convinced.

Next, I tried a bunch of upstream kernel versions. I narrowed the problem down to sometime between kernels 5.0 and 5.1-rc1. Then I ran a git bisect between those versions, following the instructions on the Ubuntu wiki for bisecting upstream kernels. I also used “make localmodconfig” (followed by enabling appletalk in “make menuconfig”) to speed up the compile process after I noticed that most of the compile time was being spent building kernel modules that I wouldn’t be loading anyway.

The bisect process took quite a while. I probably should have figured out a way to automate it with qemu using a strategy similar to the one used in this excellent blog post. But nevertheless, it finally settled on this commit from March 2019 being the start of the problem:

[6377f787aeb945cae7abbb6474798de129e1f3ac] appletalk: Fix use-after-free in atalk_proc_exit

This commit simply does a better job of checking return values of the functions called by atalk_init, and cleaning up properly if they fail. In particular, the return values of these functions, which were previously ignored, are now checked to ensure they succeed:

  • sock_register
  • register_netdevice_notifier
  • atalk_proc_init
  • atalk_register_sysctl

Further inspection revealed atalk_proc_init as the real culprit. A refactor of atalk_proc_init, which happens to be the previous commit to the one linked above, accidentally left the code in a state where it would return -ENOMEM instead of 0 on success. So it always returns -ENOMEM, regardless of success or failure. This explains the “Cannot allocate memory” error being reported when I attempted to insert the appletalk module.

Armed with this info, I did something really disgusting. I made a copy of the kernel module in /tmp and hacked it by tweaking bytes in a hex editor. A disassembly of atalk_proc_init reveals a line of code that loads a value of 0xFFFFFFF4 (-12) into the EAX register just before it exits. ENOMEM is defined as 12. So this is the line that’s causing it to return -ENOMEM. I simply hacked this line to load 0 into EAX instead. This basically leaves the logic working the same way it used to work before the two aforementioned patches were applied, because the return value was previously being ignored anyway.

Before:

294:   b8 f4 ff ff ff          mov $0xfffffff4,%eax

After:

294:   b8 00 00 00 00          mov $0x0,%eax

This hack by itself didn’t solve the problem:

doug@miniserver:~$ sudo insmod /tmp/appletalk.ko
insmod: ERROR: could not insert module /tmp/appletalk.ko: Key was rejected by service

The problem here is that Ubuntu’s kernel modules are signed. After hacking the binary, it no longer matched its signature. This is an indicator of just how ugly my hack is. So I did something even uglier: I stripped the signature out of the module completely:

doug@miniserver:~$ strip --strip-debug /tmp/appletalk.ko

Then, I tried loading it again:

doug@miniserver:~$ sudo insmod /tmp/appletalk.ko
doug@miniserver:~$

Success! My dmesg log reports an error about a failed signature, thus tainting the kernel:

[ 4479.495054] appletalk: module verification failed: signature and/or required key missing - tainting kernel

But this doesn’t matter for my purposes. netatalk works now:

doug@miniserver:~$ sudo service netatalk restart
doug@miniserver:~$ ps ax | grep atalkd
1698 ? S 0:00 /usr/sbin/atalkd
1716 pts/0 S+ 0:00 grep --color=auto atalkd

Now I can store the hacked version of the module in the correct subdirectory in /lib/modules, and everything works automatically when I reboot. Yay!

This is a really ugly fix though. Whenever I upgrade my kernel, I’m going to have to manually patch it. That is, until the patch hits the mainline kernel, and then I can hope that Ubuntu pulls the fix into their kernel. The reason I’m documenting the binary patch here is because realistically, it’s going to take forever for the kernel fix to get released. I have no idea if it will be possible to convince Ubuntu to pull this fix into their kernels once it’s released. It’s not a high-priority bug fix because nobody uses AppleTalk anymore. And I don’t want to limit myself to an old kernel just for this silly reason.

So the first step here is to submit the patch to the linux-netdev mailing list, and get the fix merged into the mainline kernel. I searched the mailing list, and discovered that I wasn’t the first one to run into this bug. Actually, two separate people have both tried to fix it, but had their patches rejected for minor reasons:

Obviously not too many people are using the appletalk kernel module these days, or else people would be up in arms about how it has been broken since kernel 5.1. I would offer up my own third attempt at getting this patch merged, but since two attempts were made just last month, I suspect one of them will succeed shortly. I hope so, anyway. In the meantime, I guess I’ll just continue binary patching my kernel module.

I think there is an even bigger long-term solution than this kernel fix. Classic Mac users who use netatalk with AppleTalk need to join forces to address a few things. netatalk 2.x is dead, but the last release of it is broken on Linux, at least the AppleTalk portion of it — which is the only reason I would still use it. Maybe we should fork netatalk 2.x? Also, I’m pretty sure the AppleTalk subsystem in the Linux kernel is full of little issues. It works well enough for most people who need netatalk support, but I know it is broken in other ways. Maybe this project idea for a portable userspace AppleTalk stack will gain some traction going forward. If it ever does, perhaps we could add support for it in a netatalk fork.

Anyway, I thought it might be interesting to share this troubleshooting journey and the eventual resolution. I just tested connecting to my Ubuntu 18.04 server from my old Mac IIci running System 7.1, and everything works perfectly again…until the next kernel upgrade, that is!

Trackback

15 comments

  1. […] Hacking up a fix for the broken AppleTalk kernel module in Linux 5.1 and newer (downtowndougbrown.com) […]

  2. I’m the guy who tried (twice!) to get this fixed in the main kernel tree.

    It looks like I should have just posted about it on my blog; less than twenty-four hours after you wrote about this, I received an email (cc’d to you and Vincent) saying that it had been fixed. Would have been much easier than jumping through the hoops that the kernel cabal now require.

    Going forward, I recommend spinning up a NetBSD virtual machine for AFP. The Appletalk code is rock-solid, the dev team are reasonable (but I’m biased: I’m a former NetBSD developer), and they actually require that code be tested before it’s blindly committed rather than block break-fixes because the reporter didn’t use the short-form commit reference ID.

    Anyway … thanks for writing this up. It caused the right thing happen.

  3. betaporter @ 2020-08-02 06:59

    Excellent write-up. Also, fairly damning to reject a completely minor fix because of a lack of adherence to specific pleasantries.

    I looked at the proposed patch thread and cannot determine what the pedants are complaining about, or what to do about it. Ridiculous.

  4. @betaporter:

    Patches containing fixes (versus feature work) should contain a “Fixes” tag in the commit message. The commit hash is abbreviated to 12 characters for readability. Typically that’s sufficient to avoid hash collisions. checkpatch will complain if the hash is formatted incorrectly.

    Why is this important? For one, maintainers of stable and distribution kernels automatically backport commits in upstream which have a Fixes tag. Secondly, for an open source project which thousands of people work on, it helps if commit messages are formatted consistently, coding style is consistent etc.

  5. Thanks for posting Chris! I too am glad that this resulted in a new patch being submitted (thank you l1k!). I was hesitant to do it myself because I didn’t understand how to submit a patch that addressed the requested changes while still maintaining correct attributions. l1k explained the process over at the Hacker News post for future reference for anybody.

    Also, thanks for your comment betaporter! This post wasn’t meant to be a criticism of any people or processes. I can understand the situation from different perspectives. It does seem kind of silly to see patches rejected for small things like that, but to be fair, the expected format of the Fixes tag is documented at kernel.org. And the reviewers were polite about everything too. With that said, I do think the Linux patch submission process can feel intimidating, especially for someone like me who has very little experience with it.

    Thanks l1k for explaining how the Fixes tag works. That was very educational! I did not realize that it was used for automatic backporting.

  6. @l1k:

    Your explanation of the process doesn’t address the elephant in the room.

    Appletalk was broken by a commit that clearly had not been tested by patch submitter. The change was approved because it followed to the patch submission process. Three attempts to fix this were rejected because the fix was not compliant with extraordinarily minor parts of the patch submission process.

    My conclusion was that the folks gatekeeping kernel break-fixes were much more interested in adhering to self-generated bureaucracy than actually fixing broken code, much less preventing code from being needlessly broken in the first place.

    I first submitted the fix because I had went through the trouble of tracking down the issue and didn’t want someone else to go through the same pain. I don’t even *use* Linux when I have a choice; {Free,Net}BSD have been my go-to server platforms for nearly three decades. The reason that I had considered Linux for this application was solely because I was repurposing a spare Raspbeery Pi as the netatalk server.

    When the first rejection came back, I put it on the “deal with later” pile because I’d already did my bit for the community. I fixed my use case by using NetBSD. The patch was now on public record and someone hitting the same issue could easily find it.

    A year later I’m going through the kernel source to debug yet another Linux oddness and I check to see if anyone has fixed the Appletalk issue. They hadn’t, so I resubmit the patch. I didn’t run checkpatch; again, it’s a one-liner that a reasonable person would approve *and* it was (to my eye, at zero-dark-thirty in the morning) to be compliant with the patch submission requirements.

    It was then rejected because it used the long-verssion commit ID reference. By the same guy who approved the original broken code. With detectable snarkiness that he’d been well-known for in the late nineties.

    So I said “screw it, life’s too short” and went on with my life. Until I was cc’ed on the patch commit. Which didn’t happen because someone else had successfully followed the patch submission process, but because *it was written up on his blog*.

    That, sir, is a very nearly perfect example of a broken process.

  7. @Doug Brown:

    I forked netatalk awhile back with the intention of reconciling the Linux/NetBSD conflicts, backporting security fixes from the 3.x branch, and properly dealing with AsanteTalk bridges.

    It’s at https://github.com/christopherkobayashi/netatalk-classic … feel free to hack on it if you want, I’ll grant you direct commit access. Just so you don’t have to re-invent the wheel.

  8. @Chris:

    It would be good if you guys could form a group which gets cc’ed on appletalk-related patches so that you get a chance to test/review stuff before it gets merged, thereby preventing breakage in the future. To do this you need to amend the MAINTAINERS entry with an “R: Real Name ” line for each person and change the status to “Supported” instead of “Odd fixes”. When someone looks up the people responsible for a file using get_maintainer, they’ll get your e-mail addresses so you’ll automatically be cc’ed. There’s no obligation to review every single patch, but that way you’ll be able to resolve problems more quickly and you can put your foot down if crap gets submitted.

    Arnaldo Carvalho de Melo used to be listed as Appletalk maintainer until he removed himself with 0c59d28121b9. Arnd Bergmann then tried to kill the protocol entirely with a6238f21736a but the change was reverted with 0ffbf8bf21db.

    Appletalk was broken by a commit that clearly had not been tested by patch submitter.

    LWN publishes a list of the most active developers each release and some folks seem to mistake it for a highscore table. If you look at the 5.1 statistics, the author of the offending commit e2bcd8b0ce6e claimed the #2 spot. It’s nigh impossible to submit 150 commits for a single release without sacrificing patch quality. Some maintainers have therefore started to summarily reject patches from developers who try to game the system and submit half-baked patches in a “fire and forget” fashion.

    Another appletalk commit by the same author broke the build and Arnd Bergmann had to mop up the mess with 27da0d2ef998.

    Instead of trying to fix the regression introduced by the offending commit, you can also ask the maintainer to revert it. Regressions are a perfectly fine justification to revert commits.

    That, sir, is a very nearly perfect example of a broken process.

    Maintainers, in particular of large subsystems such as networking, are under a deluge of patches, so the decision to accept or reject a patch might be a matter of a couple of seconds. Nothing personal. Some subsystems have switched to group maintainership to spread the load. The graphics subsystem, which is also large, has found an innovative solution by handing out commit rights to a couple dozen folks who watch each others’ backs, see Maintainers Don’t Scale. I’ve heard that the networking subsystem is considering moving to GitHub or GitLab to lower the barrier to entry for contributors. I don’t know what became of that.

  9. @l1k:

    Thank you for your thoughtful reply. It deserves a response, so I will … but I fear that we’re at cross-purposes here, so I don’t think I’ll reply beyond this. This matter has already absorbed more time and energy then it really deserves. Thanks for understanding.

    There seems to be a fundamental disconnect between what you (and by extension, the Linux netdev community) consider “contributing” and what I consider “contributing”.

    I’m not interested (and, actually, now actively disinterested) in ongoing interaction with the Linux netdev community. I’m the most common sort of contributer — the type that finds something broken, fixes it, and notifies the maintainer of both the problem and the solution.

    We aren’t supplicants respectfully requesting that our code be included in the kernel to stroke our ego. We’re trying to help out. If our help isn’t wanted, then we stop helping.

    Think of it like this: you see that your neighbor has a broken window. You happen to have a window pane that will fix the problem, so to be a good neighbor you offer the pane to the neighbor. The neighbor immediately hands you a pamphlet instructing you to frame your offer in fenya, and he won’t even consider your assistance unless you get every single verb tense absolutely correct.

    You might try to humor your neighbor — most people won’t, but you’re trying to help out. If you do, and you get it wrong, he tells you to re-read the pamphlet and try again. At that point it’s completely reasonable to walk away.

    Whatever internal processes the neighbor has that causes him to adhere strictly to protocol isn’t your concern. You’re not trying to join his family — you’re being nice. It’s the neighbor’s problem, not yours.

    When considered from that standpoint, most of your (well-written) response is irrelevant to the situation at hand (although it might help others in the future who want to interact with netdev). I will therefore only address a few points — again, thank you in advance for understanding.

    The volume of patches generated by any particular submitter isn’t really relevant. The quality of the patches is relevant, however.

    It is theoretically the gatekeeper’s responsibility to prevent low-quality code from being committed. That appears to be the entire reason for the sign-off process.

    Looking again at the various responses from the netdev gatekeepers (specifically Miller), no responsibility is being taken for approving a commit that breaks functionality. Instead, Miller pushes the onus onto the reporter — making snide public comments like “shows how many people are actually using appletalk if it doesn’t even load since v5.1”.

    Miller ignores the fact that he approved the commit that broke support. Miller doesn’t seem to understand that it’s not okay to break things even if they’re seldom used. A reasonable person, when told that they broke something, wouldn’t respond with “it’s not being used anyway”. A reasonable person would fix the problem. Miller doesn’t do that.

    The prevailing attitude appears to be “approve patches that fit the submission rules, and if the code breaks, it will be reverted”. That doesn’t seem to be what’s actually happening — the bar for fixing broken code appears to be *higher* than the bar for breaking the code. That doesn’t put the codebase in question in a favorable light.

    So that’s it. I’ve had my say. In the future, if I find an issue with code in the kernel, I won’t waste my time submitting a patch to help others out. I’ll use something else instead. Helping the netdev community isn’t worth the trouble.

    Thanks for reading. Again, I do appreciate your response … I just disagree with the mindset displayed by the response.

  10. Hello fellow vintage Macintosh users =)

    @Doug, fantastic article! Well done on debugging. You have a pretty cool blog too.

    @Chris, thanks for your contributions. I know it can be frustrating submitting patches upstream, and I urge you not to give up entirely.

    @l1k, thanks for submitting the fix and getting it accepted.

    @Doug:
    > I have no idea if it will be possible to convince Ubuntu to pull this fix into their kernels once it’s released. It’s not a high-priority bug fix because nobody uses AppleTalk anymore.

    Doug, I can handle this for you. @l1k got it accepted into net-next, here:
    https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d0f6ba2ef2c1c95069509e71402e7d6d43452512

    It should hit mainline during the 5.9 merge window. The commit is CC’d for upstream stable, so we will see if Greg KH decides it is important enough to be backported to the stable kernels. If it does, it will be included in Ubuntu automatically. If it doesn’t, well, I will submit the patch for SRU to the Ubuntu kernels for you.

    I’ll keep an eye on the patch to see what happens, and I’ll make sure Ubuntu gets fixed.

    Thanks,
    Matthew Ruffell

    (I like my IICX with its Radius Pivot screen =D, and LCII and LCIII machines).

  11. @Chris Thanks for inviting me to collaborate on the GitHub fork! That is super useful — maybe I can use this fork instead of binary patching my copy of netatalk. 🙂

    @Matthew Thanks for your kind comments! I had observed that the patch had made it into net-next, but wasn’t sure what the next part of the process would be. That’s great! I appreciate all of the useful info you provided about the patch backport process, and that is very kind of you to make sure that Ubuntu gets the patch. Thank you so much!

    In the meantime, I went ahead and made a patched copy of the appletalk module source, and set up DKMS to automatically compile and install it whenever I upgrade my kernel.

  12. @Doug Brown:

    No worries.

    I spent a little time this morning cleaning up the Linux build options. ./configure without args should do the right thing on both Linux and NetBSD now.

    I also added MacIPGW as a submodule in etc (it’s not being built by default just yet). My SE/30 and my wife’s Classic II are able to speak TCP/IP via AppleTalk through the netatalk bridge. I haven’t tested it on Linux, but the support appears to be there.

  13. @Doug:

    The patch landed in mainline with this commit:

    commit d0f6ba2ef2c1c95069509e71402e7d6d43452512
    Author: Vincent Duvert
    Date: Sun Aug 2 07:06:51 2020 +0200
    Subject: appletalk: Fix atalk_proc_init() return path

    It has been backported to the stable trees, and was released this morning in 5.4.58 and 5.7.15. Seems to be missing from 5.8.x, hopefully it turns up in 5.8.2. Thanks Greg K-H for picking up the patch!

    The Ubuntu kernel team will automatically pull in the 5.4.58 patches to the Ubuntu 5.4 kernel, and it will likely be included in the next SRU cycle.

    I will write back once the fix has landed in an Ubuntu kernel.

    Thanks,
    Matthew

  14. Thank you for the update Matthew! I was CC’ed on Greg K-H’s additions to 5.7 and 5.4, so I had an inkling it was happening. I’m impressed by how quickly new stable kernels were released! This whole experience has been super educational.

  15. Chris M. @ 2020-08-13 11:33

    I got bit by this bug last week trying to get netatalk 2.2.6 running on Ubuntu 20.04 and Debian Bullseye. Right now Bullseye is in testing with the 5.7 kernel. Hopefully the fix finds its way into whatever kernel ships in Bullseye.

    I also found out that the procedure for compiling specific kernel modules has changed with 5.x kernels. Ugh.

    Also pulled out the grouchy Shiva Fastpath 5 to do some serious testing with LocalTalk clients against netatalk. Seems to work OK at the moment…..

Add your comment now