[ Content | Sidebar ]

Archives for April, 2020

Wlroots and Phosh on Samsung S7

April 19th, 2020

A few weekends ago I left my Samsung S7 running Gnome on software-rendered X11. This kind of works as a demo but it’s slow and clunky so I followed that by attempting to get Phosh running. Phosh is a gnome-shell replacement for Purism’s Librem5. It uses phoc as a Wayland compositor instead of Mutter, which in turn is based on wlroots, the compositor-as-a-library component of Sway.

It should be much easier to add a hwcomposer backend to wlroots than Mutter, and in fact someone already started: NotKit/wlroots. I took this and rebased on the latest upstream wlroots tag, hacked the code around until it compiled with the new interface, ran the example app and… the screen flashed green for a second and then kernel panicked and rebooted. Ouch.

<0>[ 7300.959344] I[0:      swapper/0:    0] Kernel panic - not syncing: Unrecoverable System MMU Fault!!
<0>[ 7300.959382] I[0:      swapper/0:    0] Kernel loaded at: 0x8013c000, offset from compile-time address bc000
<3>[ 7300.959438] I[0:      swapper/0:    0] exynos_check_hardlockup_reason: smc_lockup virt: 0xffffffc879980000 phys: 0x00000008f9980000 size: 4096.
<0>[ 7300.959489] I[0:      swapper/0:    0] exynos_check_hardlockup_reason: SMC_CMD_GET_LOCKUP_REASON returns 0x1. fail to get the information.
<0>[ 7300.959534] I[0:      swapper/0:    0] exynos_ss_prepare_panic: no core got stucks in EL3 monitor.

The panic log is not very helpful, there’s no user stack trace.

After a painful few hours debugging by adding prints and sleeps and comparing against the working test_hwcomposer from libhybris I managed to fix it. I’ve pushed a hwcomposer-0.10.1 branch here.

Then I built phoc and phosh, linking against the modified wlroots and libhybris. To my surprise it Just Worked, with the exception of touch input. Input requires enabling the libinput backend of wlroots, and that in turn requires an active “session”. Session in the systemd world means being associated with a “seat” in systemd-logind. We can do that by starting phoc inside a systemd service and associating it with a TTY. I copied phosh.service from the Librem5 package and edited it for my system.

Unfortunately phoc then hangs at startup inside the wlroots libinput backend polling for sd_seat_can_graphical(..) to return true. Logind seems to make some people very angry, but debugging it with the source code and loginctl wasn’t too bad.

$ loginctl 
    132 1000 nick            
    162 1000 nick seat0 tty7     <---------------
      4 1000 nick            
      6 1000 nick            
      7 1000 nick            
     c2    0 root       pts/4
6 sessions listed.

Here phoc is running on seat0 which is attached to /dev/tty7.

$ loginctl show-seat seat0 
CanGraphical=no    <-----------

From reading the logind source, CanGraphical is true if there is a device attached to the seat that has the udev TAG attribute with value "master-of-seat". Normally this attribute is added to graphics devices by the udev rules systemd ships in /lib/udev/rules.d/71-seat.rules. The S7 has a special “decon” graphics driver so none of the standard rules match. But it’s easy to add a custom rule:

SUBSYSTEM=="graphics", KERNEL=="fb0", DRIVERS=="decon", TAG+="master-of-seat"

After reloading and retriggering the udev rules, the framebuffer device now has this tag:

$ udevadm info /dev/fb0
P: /devices/13960000.decon_f/graphics/fb0
N: fb0
L: 0
S: graphics/fb0
E: DEVPATH=/devices/13960000.decon_f/graphics/fb0
E: DEVNAME=/dev/fb0
E: SUBSYSTEM=graphics
E: ID_PATH=platform-13960000.decon_f
E: ID_PATH_TAG=platform-13960000_decon_f
E: ID_FOR_SEAT=graphics-platform-13960000_decon_f
E: ID_SEAT=seat0
E: DEVLINKS=/dev/graphics/fb0
E: TAGS=:seat0:seat:master-of-seat:      <-----------

And after restarting phoc/phosh touch is working! 😀

From left: Gnome calculator, app drawer, Gnome terminal and squeekboard onscreen keyboard

Installing GNU/Linux on my Samsung S7

April 6th, 2020

I want to do something useful with my old Samsung S7. Previously I’ve tried LineageOS, and while it works OK, it wasn’t as stable as the stock Samsung ROM and you still end up installing a lot of proprietary apps on it by necessity. I’d much rather run a proper GNU/Linux distro on it. This post is documenting my efforts at doing that, with some minor success.

A failed mainling

My first attempt was to use PostmarketOS. This is a proper free/libre distribution in that it doesn’t rely on any Android blobs for hardware access. Which is great if your SoC/GPU/modem/etc. has mainline Linux support. But I have the European S7 which contains the Samsung Exynos 8890 SoC, for which the only publicly available source code is the Android 3.18 kernel dump. For graphics this means you’re stuck with an unaccelerated framebuffer only which is a non-starter for any modern GUI.

The solution to this would be to update the Samsung provided kernel to a more recent version of Linux, which has free drivers for the Mali GPU, etc. I tried rebasing their code on a slightly more recent 4.4 kernel but once I looked into the SoC’s PCIe and USB drivers I realised how enormous the task was and gave up. Sadly unless Samsung do the work themselves or even just release the 8890 datasheet this device is never going to run anything later than 3.18.


Halium is an interesting project for devices stuck in this situation. Drivers for Android devices are usually split between a minimal GPL’d kernel driver and a proprietary userspace blob (HAL). Halium allows you to run the userspace blobs inside a LXC container, using a minimal Android system.img without any of the UI level components. Hybris then allows you to link to and call the Android libraries from normal Linux programs linked against glibc. It’s the basis of UBPorts and Plasma Mobile.

There’s already some progress getting Halium to run on the S7. I picked this up and got it to the point where it can run GNOME under X11.

Debugging in the initrd

The Halium boot.img contains a very useful init-script which is great for debugging. You can configure it enable the USB network device and then start a telnet server inside the initrd and wait for you to connect before it continues booting. This lets you chroot into the target filesystem, poke around, and most importantly read /proc/last_kmesg to debug kernel panics. Without this I would have given up pretty quickly, as will no serial port or other debug output there’s little feedback when something goes wrong.

The S7 has an annoying bug where the USB Ethernet MAC address is all zeros. I’ve put a patch for that here.

Systemd-journald hangs the system

The default Halium rootfs image hangs very early in boot. On the GitHub issue someone noticed this was caused by journald and it would boot a bit further if you stop this from running. You can do this with systemctl mask systemd-journald.service.

The problem here is caused by the max77854_fuelgauge battery driver not implementing some properties to read its status from /proc. Journald gets stuck in a loop of polling these files and locks up the system. Why is journald trying to read battery information? No idea, but it’s trivial to fix this behaviour with a patch to the kernel.

Disable Android “paranoid network”

Android has some patches to disable all network access unless the user is a member of some specific group. This is useless for a normal Linux userland so disable it in the defconfig.

Replacing the Halium system image with Debian

After this we can boot up into the default Halium rootfs which is based on Ubuntu 16.04. The LXC Android container boots, loads all the firmware, and WiFi is working but not much else. I thought about trying some other stock Halium rootfs like Plasma Mobile or UBPorts, but really I’d rather set my phone up like a regular computer so I used debootstrap to install a minimal Debian filesystem in the /data partition on the phone. My home directory is on a 64GB microSD card so I can safely blat the OS and reinstall. This requires tweaking the boot script in the initrd because Halium normally loopback mounts a rootfs image inside /data and then does a switch_root into that, but now we have the OS installed directly in /data.

Sadly this just hangs on boot. Debugging with the initrd telnet interface, systemd is stuck unable to start any processes. Turns out this is because systemd 245 depends on the “ambient capabilities” feature which isn’t present in the 3.18 kernel. Google have backported this to the Android 3.18 series so we can just apply that patch on the Samsung kernel.

This gets us a bit further and then the kernel panics while starting udevd.

&lt;0>[  132.035960]  [5:         v4l_id:  787] Call trace:
&lt;0>[  132.035979]  [5:         v4l_id:  787] [<ffffffc0000ec328>] dump_backtrace+0x0/0x144
&lt;0>[  132.035990]  [5:         v4l_id:  787] [<ffffffc0000ec5ec>] die+0x140/0x228
&lt;0>[  132.036009]  [5:         v4l_id:  787] [<ffffffc0000f82b8>] __do_kernel_fault+0xb4/0xd8
&lt;0>[  132.036019]  [5:         v4l_id:  787] [<ffffffc0000f858c>] do_page_fault+0x2b0/0x2fc
&lt;0>[  132.036029]  [5:         v4l_id:  787] [<ffffffc0000f86c0>] do_translation_fault+0xe8/0x150
&lt;0>[  132.036039]  [5:         v4l_id:  787] [<ffffffc0000e5268>] do_mem_abort+0x38/0xa4
&lt;0>[  132.036050]  [5:         v4l_id:  787] [<ffffffc0000e7d70>] el1_da+0x20/0x78
&lt;0>[  132.036065]  [5:         v4l_id:  787] [<ffffffc00067d00c>] v4l_querycap+0x28/0x50
&lt;0>[  132.036076]  [5:         v4l_id:  787] [<ffffffc000680eb4>] __video_do_ioctl+0x164/0x254
&lt;0>[  132.036085]  [5:         v4l_id:  787] [<ffffffc000681250>] video_usercopy+0x2ac/0x504
&lt;0>[  132.036094]  [5:         v4l_id:  787] [<ffffffc0006814b8>] video_ioctl2+0x10/0x1c
&lt;0>[  132.036103]  [5:         v4l_id:  787] [<ffffffc00067bd3c>] v4l2_ioctl+0x78/0x130
&lt;0>[  132.036117]  [5:         v4l_id:  787] [<ffffffc00068c468>] do_video_ioctl+0xfd8/0x1d04
&lt;0>[  132.036128]  [5:         v4l_id:  787] [<ffffffc00068d1ec>] v4l2_compat_ioctl32+0x58/0xb4
&lt;0>[  132.036142]  [5:         v4l_id:  787] [<ffffffc000230bc8>] compat_SyS_ioctl+0x10c/0x1250

This is in some ioctls for the camera sensor driver. I think the problem here is that udev is enumerating the devices before the Android HAL blobs have had a chance to do their initialisation. I sprinkled some random NULL checks and it does crash anymore but I’m not really happy with this so haven’t pushed the patch.

Halium LXC container

Next I copied all of the lxc-android repository onto the device, as well as the Android system.img from the default Halium install. This contains all the scripts to start the LXC Android container. The LXC config file in the repository is from an older version of LXC than current Debian unstable. It’s easy to upgrade in-place with lxc-update-config -c /var/lib/lxc/android/config.

Enable the services that mount the Android filesystems:

systemctl enable android-mount.service
systemctl enable system.mount

Then reboot and check the Android side started up with /system/bin/logcat. This should load all the firmware blobs and enable the WiFi interface.

Hybris tests and udev permission problems

Hybris comes with a bunch of useful test programs for checking the various Android wrappers like GLES, sound, camera, etc. are working. Unfortunately they all failed with some cryptic error messages:

nick@samsung:~$ test_glesv2 
library "libgui.so" wasn't loaded and RTLD_NOLOAD prevented it
ERROR: The DDK is not compatible with any of the Mali GPUs on the system.
The DDK was built for 0x880 r2p0 status range [0..15], but none of the GPUs matched:
test_glesv2: ../../tests/test_glesv2.c:113: main: Assertion `eglGetError() == EGL_SUCCESS' failed.

A quick check in Android’s logcat reveals a lot of failures to open devices:

03-21 08:15:14.350     0  1574 D libEGL  : failed to load libgui: dlopen failed: 
03-21 08:15:14.356     0  1574 D libEGL  : loaded /vendor/lib/egl/libGLES_mali.so
03-21 08:15:14.368     0  1574 E ion     : open /dev/ion failed!
03-21 08:15:14.368     0  1574 E gralloc : /dev/graphics/fb0 Open fail
03-21 08:15:14.368     0  1574 E gralloc : Fail to init framebuffer
03-21 08:15:14.368     0  1574 E ion     : open /dev/ion failed!
03-21 08:15:14.368     0  1574 W libEGL  : eglInitialize(0xf786da60) failed (EGL_NOT_INITIALIZED)

This looks like a simple permission problem where the permissions on the device files are not set up the way Android expects. Simply following the Halium documentation and generating the udev rules file for the device then rebooting fixes this. Patch here.

Now we get a little bit further and test_hwcomposer crashes inside some EGL initialisation function. I don’t really understand what’s going on but it seems to be some mismatch between the gralloc version libhybris is using and the gralloc version used by the Samsung blobs. (Gralloc is Android’s graphics memory allocation layer.) I hacked around this by forcing hybris to skip the check for gralloc v1. I’m not very happy about this so I’ll perhaps revisit it later.

UPDATE: better fix here: libhybris/libhybris#446.

And now all graphics, lights, and vibrator tests all work! I couldn’t immediately get camera and sound working, but I didn’t try very hard.

Wayland, X11, GNOME

I’ve been using GNOME3 on my desktop quite happily for a while now and I’d like to run it on my phone too. The launcher is touch friendly-ish and Purism have done a lot of work porting the various GNOME apps to a mobile form-factor using libhandy.

Ideally I’d run the Wayland variant of Mutter because both Wayland and the Android graphics stack are based on GLES and this should get us hardware accelearation for the UI. Unfortunately this is a bit of a non-starter as the Mutter Wayland backend is tied to DRM which Android kernels don’t support. So we’re stuck with X11 or using a different compositor entirely.

Thankfully there’s the xf86-video-hwcomposer project that provides an X driver using Android’s hwcomposer and libhybris. This provides hardware acceleration on the X server side but any client side drawing is unaccelerated and uses Mesa’s llvmpipe software fallback.

And finally…

There’s really nothing like the triumphant feeling of doing apt install emacs on your phone followed by the despair of discovering there’s no way to press those C- and M- meta keys.