当前位置: 代码迷 >> 综合 >> How to debug QEMU (by quqi99)
  详细解决方案

How to debug QEMU (by quqi99)

热度:95   发布时间:2023-12-13 09:14:54.0

作者:张华 发表于:2016-08-11
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明
( http://blog.csdn.net/quqi99 )

LOG

1, Capture DEBUG for QEMU & libvirt but only WARN + ERROR for the rest, modify the file /etc/libvirt/libvirtd.conf

# 1 debug, 2 info, 3 warn, 4 error, 5 fatal  - https://libvirt.org/kbase/debuglogs.html
# Capture DEBUG for QEMU & libvirt but only WARN + ERROR for the rest.
# https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/Uploads/Debugging-libvirt-QEMU-in-OpenStack-Tokyo-2015.pdf
log_filters="1:qemu 1:libvirt 3:security 3:event 3:util 3:file"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"log_level = 3
log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 3:remote 1:util 1:util.eventpoll"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"
sudo systemctl restart libvirtd

2, To log all libvirt API calls, export (/etc/default/libvirt-bin). Specify multiple log outputs(journald and virsh.log)

LIBVIRT_DEBUG=1
LIBVIRT_LOG_FILTERS="1:qemu"
LIBVIRT_LOG_OUTPUTS="1:journald 1:file:virsh.log"

3, sudo service libvirt-bin restart
/var/log/libvirt/qemu/instance-00000002.log
/var/log/libvirt/libvirtd.log

4, qemu’s log, add the following
-d guest_errors,unimp -D /tmp/qemu.log

5, query log
journalctl /usr/sbin/libvirtd [-f]
journalctl -o verbose /usr/sbin/libvirtd
journalctl -u libvirtd -l -p err ––since=today

ftrace trace howto

qemu supports multiple trace backends like nop, dtrace, ftrace, log, simple, ust, This is a blog about ftrace howto - http://git.qemu.org/?p=qemu.git;a=commit;h=781e9545dbe54d2f0a530edbf48fd0ac1e77244c

./configure --trace-backend=ftrace
make
sudo echo 1 > /sys/kernel/debug/tracing/events/kvm/enable
qemu-system-x86_64 -enable-kvm -trace events=/home/events
less /sys/kernel/debug/tracing/trace
qemu-system-x86-23226 [002] d… 116142.685923: kvm_exit: reason IO_INSTRUCTION rip 0xc45b info 700040 0
qemu-system-x86-23226 [002] … 116142.685924: kvm_pio: pio_write at 0x70 size 1 count 1

–enable-trace-backend=simple
#check the list of traceable functions
hw/virtio/virtio.c
-trace events=/tmp/mytracefile
virtio-*
balloon*
qemu-memalign
#translate the result
sudo qemu-2.5+dfsg/scripts/simpletrace.py qemu-2.5+dfsg/trace-events ./trace-30587

https://rwmj.wordpress.com/2016/03/17/tracing-qemu-guest-execution/
http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html
http://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/devel/tracing.txt;hb=refs/heads/master

Observe any arbitrary QMP events

sudo virsh qemu-monitor-command --hmp trusty 'help'
# https://qemu.weilnetz.de/doc/qemu-doc.html#pcsys_005fmonitor
sudo virsh qemu-monitor-command --hmp trusty 'stop'
sudo virsh qemu-monitor-command --hmp trusty 'c'
sudo virsh qemu-monitor-command --hmp trusty 'info network'
#sudo virsh qemu-monitor-command instance-00000002 --pretty '{"execute":"query-commands"}'
#sudo virsh qemu-monitor-command instance-00000002 --pretty '{"execute":"query-block"}'# Then we can read related code about 'qemuProcessEventHandler -> processWatchdogEvent'
sudo virsh qemu-monitor-event instance-00000002 --pretty --loop#nova live-migrate --block-migrate vm1 $DEST-HOST
#sudo virsh migrate –verbose --copy-storage-inc --p2p --live vm1 qemu+ssh://root@dest/system
$ sudo virsh qemu-monitor-event instance-00000002 --pretty --loop
error: internal error: virsh qemu-monitor-event: no domain VSH_OT_DATA option
event POWERDOWN at 1470898098.552573 for domain instance-00000002: <null>
event NIC_RX_FILTER_CHANGED at 1470898099.457593 for domain instance-00000002: {"name": "net0","path": "/machine/peripheral/net0/virtio-backend"
}
event SHUTDOWN at 1470898100.598173 for domain instance-00000002: <null>
event STOP at 1470898100.600347 for domain instance-00000002: <null>
event SHUTDOWN at 1470898100.601556 for domain instance-00000002: <null>
event RESUME at 1470898102.503793 for domain instance-00000002: <null>
event NIC_RX_FILTER_CHANGED at 1470898122.381443 for domain instance-00000002: {"name": "net0","path": "/machine/peripheral/net0/virtio-backend"
}
**

Monitor

#virtio-balloon stats virtqueue testhua@node1:/images/kvm$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=qcow2,file=./ubuntu14.04_test.qcow2 -device virtio-balloon-pci,id=virtio-balloon0 -S -monitor stdio 
QEMU 2.5.0 monitor - type 'help' for more information
(qemu) qom-set virtio-balloon0 guest-stats-polling-interval 5
(qemu) c
(qemu) savevm
(qemu) quithua@node1:/images/kvm$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=qcow2,file=./ubuntu14.04_test.qcow2 -device virtio-balloon-pci,id=virtio-balloon0 -S -monitor stdio 
QEMU 2.5.0 monitor - type 'help' for more information
(qemu) loadvm 1
(qemu) c
#https://en.wikibooks.org/wiki/QEMU/Monitor
# Migration
(qemu) migrate tcp:127.0.0.1:1234   #with: -incoming tcp::1234(qemu) info blockstats
virtio0: rd_bytes=0 wr_bytes=20480 rd_operations=0 wr_operations=5 flush_operations=2 wr_total_time_ns=859822442 rd_total_time_ns=0 flush_total_time_ns=52308038 rd_merged=0 wr_merged=0 idle_time_ns=287986673745(qemu) gdbserver tcp::12345
Waiting for gdb connection on device 'tcp::12345'hua@node1:/bak/linux$ gdb ./qemu -q
./qemu: Success.
(gdb) target remote localhost:12345 
wget http://wiki.qemu.org/download/linux-0.2.img.bz2 -O linux-0.2.img.bz2
bunzip2 -d linux-0.2.img.bz2#create telent monitor
sudo qemu-system-x86_64 -hda /bak/images/linux-0.2.img -qmp tcp:localhost:4444,server --monitor stdio
telnet localhost 4444#create unix socket monitor
# 或者当用户在qemu虚拟机界面上输入Ctrl+ALT+2将进入qemu monitor; 然后按CTRL+ALT+1将恢复当正常的虚拟机窗口。
sudo qemu-system-x86_64 /bak/images/linux-0.2.img -qmp unix:/tmp/qmp-sock,server --monitor stdio
sudo nc -U /tmp/qmp-sock#When not using '--monitor stdio'
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "query-status" }
{"return": {"status": "running", "singlestep": false, "running": true}}

附录 - VM随机死掉

遇到一个问题,客户在升级的时候发现VM随机死掉。

1, libvirtd daemon (vNetServer side) can handle the event VIR_EVENT_HANDLE_HANGUP in the function virEventPollFromNativeEvents

libvirtd (use virDriverLoadModule to interact with qemu driver) -> virNetDaemonRun -> virEventRunDefaultImpl -> virEventPollRunOnce -> virEventPollDispatchHandles (vireventpoll.c) -> virEventPollFromNativeEvents (set VIR_EVENT_HANDLE_HANGUP according to POLLHUP) -> virEventHandleCallback

int virEventPollToNativeEvents(int events){
int ret = 0;

if (events & VIR_EVENT_HANDLE_HANGUP)
ret |= POLLHUP;
return ret;
}

2, qemu process (vNetClient side) uses the following code path to register the event VIR_EVENT_HANDLE_HANGUP in the function qemuMonitorOpenInternal when starting a VM.

qemuHypervisorDriver -> qemuDomainCreateXML[qemu_driver.c] -> qemuProcessStart -> qemuProcessLaunch -> qemuProcessWaitForMonitor -> qemuConnectMonitor[qemu_process.c] -> qemuMonitorOpen -> qemuMonitorOpenInternal -> virEventAddHandle

if ((mon->watch = virEventAddHandle(mon->fd,VIR_EVENT_HANDLE_HANGUP |VIR_EVENT_HANDLE_ERROR |VIR_EVENT_HANDLE_READABLE,qemuMonitorIO,mon,virObjectFreeCallback)) < 0)

The unix socket between libvirt and qemu is ‘-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00029015.monitor’, the event POLLHUP represents vNetClient Side (qemu side) has closed socket. Why qemu side closed the socket, qemu side should have some problem, so the core dump file is needed for further analysis.

3, qemu side - qemu uses the following code path to start qemu monitor:

main(vl.c, text_console_init) -> mon_init_func -> monitor_init(bind char device for monitoring) -> monitor_qmp_event

starting qemu monitor needs a kind of char device, we are using unix socket, so by checking register_types here the char device should be CHARDEV_BACKEND_KIND_SOCKET, then we have the following code path.

register_types(CHARDEV_BACKEND_KIND_SOCKET) -> qmp_chardev_open_socket -> qemu_chr_open_socket_fd(qemu-sockets.c) -> socket_connect -> unix_connect_opts(opts, errp, callback, opaque) -> qemu_chr_socket_connected(it’s callback) -> check_report_connect_error

static void check_report_connect_error(CharDriverState *chr,Error *err){
TCPCharDriver *s = chr->opaque;
if (!s->connect_err_reported) {
error_report(“Unable to connect character device %s: %s”,
chr->label, error_get_pretty(err));
s->connect_err_reported = true;
}
qemu_chr_socket_restart_timer(chr);
}

void error_vprintf(const char *fmt, va_list ap)
{
if (cur_mon && !monitor_cur_is_qmp()) {
monitor_vprintf(cur_mon, fmt, ap);
} else {
vfprintf(stderr, fmt, ap);
}
}
4, VM can also be stopped by nova via:

start(service.py) -> init_host(manage.py) -> init_virt_events -> handle_events -> handle_lifecycle_event -> _sync_instance_power_state
_sync_power_states -> _sync_instance_power_state

So we can run:
juju config nova-compute config-flags=‘sync_power_state_interval=-1’

trace event

0) Set guest name/PID:
GUEST=instance-0000035f # from guest XML.
GUEST_PID="$(ps ax -o pid,cmd | awk "/qemu-system-x86_64 .*guest=$GUEST,/" | awk '{ print $1; exit }')"1) Start strace:
sudo strace -ttT -y -ff -o qemu.strace.$GUEST -p ${GUEST_PID} -e preadv -e pwritev -e io_submit
# Example output (read operations w/ io=threads and io=native):
... preadv(29</var/lib/uvtool/libvirt/images/focal.qcow>, ...) = 1048576 <0.004235>
... io_submit(0x7f6c65c71000, 1, [{aio_lio_opcode=IOCB_CMD_PREADV, aio_fildes=30</var/lib/uvtool/libvirt/images/focal.qcow>, ...) = 1 <0.000028>2) Start logging trace events:
ON_OFF=on
for EVENT in \
paio_submit_co \
paio_submit \
mirror_iteration_done \
mirror_yield_in_flight \
mirror_one_iteration \
mirror_restart_iter \
mirror_yield_in_flight \
mirror_yield_in_flight \
mirror_yield \
mirror_yield \
mirror_before_flush \
mirror_before_drain \
mirror_before_sleep \
mirror_start \
paio_submit \
qcow2_cache_entry_flush \
qcow2_cache_flush \
qcow2_cache_get \
qcow2_cache_get_replace_entry \
qcow2_cache_get_read \
qcow2_cache_get_done \
stream_one_iteration \
stream_start \
qcow2_writev_start_req \
qcow2_writev_start_part \
qcow2_writev_data \
qcow2_writev_done_part \
qcow2_writev_done_req \
qcow2_pwrite_zeroes_start_req \
qcow2_pwrite_zeroes \
commit_one_iteration \
commit_start \
bdrv_co_do_copy_on_readv \
bdrv_co_preadv \
bdrv_co_pwritev \
bdrv_co_pwrite_zeroes \
qcow2_l2_allocate \
qcow2_l2_allocate_get_empty \
qcow2_l2_allocate_write_l2 \
qcow2_l2_allocate_write_l1 \
qcow2_l2_allocate_done \
qcow2_l2_allocate_done \
qcow2_cluster_link_l2 \
qcow2_handle_copied \
qcow2_do_alloc_clusters_offset \
qcow2_cluster_alloc_phys \
qcow2_handle_alloc \
qcow2_alloc_clusters_offset \
backup_do_cow_enter \
backup_do_cow_skip \
backup_do_cow_process \
backup_do_cow_read_fail \
backup_do_cow_write_fail \
backup_do_cow_return \
blk_co_preadv \
blk_co_pwritev \
hd_geometry_lchs_guess \
hd_geometry_guess \
virtio_blk_req_complete \
virtio_blk_rw_complete \
virtio_blk_submit_multireq \
virtio_blk_handle_write \
virtio_blk_handle_read \
virtio_blk_data_plane_start \
virtio_blk_data_plane_stop \
; \
do
virsh qemu-monitor-command --hmp $GUEST trace-event $EVENT $ON_OFF
done3) You can watch output with:
sudo tail -f /var/log/libvirt/qemu/$GUEST.log
# Example output, one read:
7723@1584054270.402793:virtio_blk_handle_read ...
7723@1584054270.403131:blk_co_preadv ...
7723@1584054270.403389:bdrv_co_preadv ...
7723@1584054270.403720:qcow2_cache_get ...
7723@1584054270.403988:qcow2_cache_get_done ...
7723@1584054270.404429:bdrv_co_preadv ...
7723@1584054270.407088:virtio_blk_rw_complete ...
7723@1584054270.407309:virtio_blk_req_complete ...4) Wait for the problem to reproduce
After the problem reproduces:
---
- Stop strace (ctrl-c)
- Copy the guest log:
sudo cp /var/log/libvirt/qemu/$GUEST.log qemu-trace-events.$GUEST.log
- Repeat command to start trace events, but this time with 'off':
ON_OFF=off
for EVENT in \
...
; \
do
virsh qemu-monitor-command --hmp $GUEST trace-event $EVENT $ON_OFF
done
- Compress the log files, and upload to the case:
sudo tar cJf qemu-logs.$GUEST.$(date +'%F-%T' | tr : -).tar.xz qemu.strace.* qemu-trace-events.*

trace qemu

1, apparmor中(/etc/apparmor.d/abstractions/libvirt-qemu)添加下列配置,免得它抱怨:

/usr/bin/strace ux,
/bin/date ux,

2, 修改qemu-system-x86_64

sudo mv /usr/bin/qemu-system-x86_64 /usr/bin/qemu-system-x86_64-original
vim /usr/bin/qemu-system-x86_64
#!/bin/bash# sudo strace -p $(pidof /usr/sbin/libvirtd) -e trace=open,read,write,getsockname,socket,openat,close -s 1024 -o /tmp/libvirt.trace
strace -D -ff -x -tt -e trace=sendmsg,recvmsg -s 1024 -o /tmp/qemu.$(date +%s).trace -- /usr/bin/qemu-system-x86_64-original $@

20210716 update

migration因为下列原因持续两个多小时才结束.

2021-06-16 15:27:16.657+0000: 15631: error : qemuDomainObjBeginJobInternal:9560 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)

enable log之后,libvirt向qemu发出qmp_capabilities的查询,但无返回.这样monitor未返回锁,导致其他所有的monitor命令都无法运行,这样还会表现对整个虚机的操作均无反应.

22808:2021-07-15 15:58:22.893+0000: 2510215: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7f1b30033160 buf={"execute":"qmp_capabilities","id":"libvirt-1"}

下列日志说明了原因:

virQEMUCapsCacheLookupDefault:5579 : invalid argument: KVM is not supported

同时也看到了:

libvirtd.log:2021-07-02 12:35:06.209+0000: 18315: debug : virQEMUCapsIsValid:4762 : Outdated capabilities for '/usr/bin/qemu-system-x86_64': kernel version changed ('5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021' vs '5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021')

kvm找不着kvm capability的原因可参考:https://qkxu.github.io/2021/06/18/libvirt%E6%89%BE%E4%B8%8D%E5%88%B0%E6%94%AF%E6%8C%81kvm%E7%9A%84guest.html

  相关解决方案