PCI passthru

PCI passthru is a technique to preserve PCI slots at boot time and later on give these ressources to a (qemu) VM. Common usage is for VGA cards (then called “GPU passthru” often). But it is not limited to a GPU. I used it more extensive here . All is done on a Gentoo Linux with kernel-4.4.26 and qemu-2.8.0. I assume that some knowledge about qemu configuration and how to setup a standard VM is available.

Hardware

The most important thing is that the hardware MUST support IOMMU. Unfortunately it is not a common feature in consumer hardware as of today. I think I don't have to mention virtualization as a requirement separately. I used the following hardware:

Mainboard MSI C236M (Mini ATX) with chipset C236 (graphic, network, audio on-board)
Intel Xeon E3-1235L v5
Nvidia GeForce GT630 (GK208) as second graphic card (PCIe)
some secondary PCIe cards (not important here)

BIOS

At the BIOS level IOMMU and Virtualization (VT/x) have to be activated. The on-board graphic have to be initialized as primary device also. For sure - with an AMD CPU the related AMD features have to be used.

Configure the kernel

Some features have to be compiled into the kernel:

kvm support
iommu support
vfio-drivers
virtio drivers

I recommend to build the virtio drivers as modules. Unfortunately the features are in different submenus, e.g. the virtio-scsi driver is located under “SCSI low-level drivers” (alone). Rebuild the kernel if necessary, but don't reboot yet.

Verify IOMMU

To check if IOMMU is available put the following into a script (e.g. find-iommu-groups ):

for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);
  do echo "IOMMU group $(basename "$iommu_group")";
     for device in $(ls -1 "$iommu_group"/devices/);
         do echo -n $'\t'; lspci -nns "$device";
     done;
  done

Make it executable and run it. The output should be like this (shortened):

$ ./find-iommu-groups
IOMMU group 1
        00:01.0 PCI bridge [0604]: Intel Corporation Sky Lake PCIe Controller (x16) [8086:1901] (rev 07)
        01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 630 Rev. 2] [10de:1284] (rev a1)
        01:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
IOMMU group 2
        00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:191d] (rev 06)
IOMMU group 3
        00:08.0 System peripheral [0880]: Intel Corporation Sky Lake Gaussian Mixture Model [8086:1911]
IOMMU group 4
...

First it shows that IOMMU is available. Second, the important group in our case is IOMMU group 1. It contains the hardware we want to separate. The Nvidia Audio device will not be used further here. But fore sure it could be used.

Beware that the IOMMU implementation is broken on some hardware!

The vfio-pci driver

Let's check out the hardware with lspci -k . I shortened the output to the graphic cards hardware:

00:02.0 VGA compatible controller: Intel Corporation Device 191d (rev 06)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7972
        Kernel driver in use: i915
        Kernel modules: i915
...
01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 630 Rev. 2] (rev a1)
        Subsystem: CardExpert Technology GK208 [GeForce GT 630 Rev. 2]
        Kernel driver in use: nvidia
        Kernel modules: nvidia_drm, nvidia
...

The important lines are kernel driver in use: <driver name> . What we want to achieve is to replace the nvidia driver by vfio-pci .

This will be reached by adding some options to the kernel command line. Thus we have to identify the hardware id's before (for sure we need this for the Nvidia card only):

$ lspci -nn | grep NVIDIA
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 630 Rev. 2] [10de:1284] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)

The id is given in the second to last field above. Now we add the following options to the kernel command line:

kernel <image> ... intel_iommu=on vfio-pci.ids=10de:1284,10de:0e0f

The vfio-pci driver can be loaded dynamically at runtime too, but I don't prefer this way!

Now it is time to reboot. Afterwards it is a good idea to verify if the vfio-pci driver is used. Check with lspci -k again:

00:02.0 VGA compatible controller: Intel Corporation Device 191d (rev 06)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7972
        Kernel driver in use: i915
        Kernel modules: i915
...
01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 630 Rev. 2] (rev a1)
        Subsystem: CardExpert Technology GK208 [GeForce GT 630 Rev. 2]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidia_drm, nvidia
...

If something goes wrong check dmesg and logs. Common failures are typos of the id or a module was not loaded.

Configure qemu

The following is not a full qemu start script. It shows the important options only!

I like to use a variable to group options. The $GPU variable contains the options I use. Add the following (or similar) to the qemu start script:

GPU="-device vfio-pci,host=01:00.0,multifunction=on,x-vga=on -vga none"
 
qemu-system-x86_64 -cpu host,kvm=off ... $GPU ... -cdrom <bootable-cd-image>

The -cdrom option is for an initial test. After connecting the output of the second VGA card (here: GT630) with a monitor (screen) the VM can be started for a initial test. It could take a few seconds but the second screen should show something. Otherwise check out the above careful again. To stop the VM simply kill it at the host system.

Caveats

As maybe realized (or maybe not) there was no keyboard/mouse available during the initial test, because this depends on the emulated qemu VGA driver. Thus we have to bind a keyboard/mouse to the VM through the device id. Assuming USB use lsusb :

$ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 10d5:000d Uni Class Technology Co., Ltd
Bus 001 Device 002: ID 046d:c330 Logitech, Inc.
Bus 001 Device 127: ID 046d:c52b Logitech, Inc. Unifying Receiver
Bus 001 Device 126: ID 05e3:0608 Genesys Logic, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

In the output above is the keyboard the device in line 3, the mouse device is line 4. Add the id to the qemu start script like (example: keyboard only):

KBD="-usb -usbdevice host:046d:c330"
 
qemu-system-x86_64 -cpu host,kvm=off ... $GPU $KBD... -cdrom <bootable-cd-image>

Again, this is just an example. The mouse could be bound similar with its own id. However,

I don't use this technique, because the USB device is not available at the host anymore as long the VM is up and running!

But it is good enough for testing purposes. If you bind the keyboard and/or the mouse it is a good idea to have a second one available.

Summary

The above shows how to use some PCI hardware from the qemu host inside a VM exclusive. Using the VGA card is just an example, but it is not limited to. Other PCI devices can be separated this way as well.

Creation Date: 2017/03/08 07:56, Last modified:: 2018/09/21 06:14, Author: Kai Peter, Tags:

The DynDN.eS Blog