postcopy对我来说一直是一个神秘的东西,这次终于有时间仔细研究一下。
所谓postcopy直接来说就是在虚拟机启动后再迁移
正常情况下我们是先把虚拟机的信息拷贝到目的端,包括内存、设备,然后在目的端启动虚拟机。而postcopy引入了一个非常有意思的想法,就是先把虚拟机启动起来,然后再来迁移需要的信息。当然这里最重要的就是内存了。
本小节也是以内存为主线,其余支持postcopy的部分暂且不表。(主要是因为没看)
从用法开始
在探讨细节前,我们先看看是怎么使用postcopy的。总体过程和普通迁移类似,有一点有意思的区别是在启动迁移后,再启动postcopy。
以在monitor中执行迁移为例,需要执行的命令是:
Copy migrate_set_capability postcopy-ram on # both source and destination
migrate_set_capability postcopy-blocktime on # both source and destination
migrate -d tcp:0:4444
migrate_start_postcopy # after first round of sync
其中第一二条命令需要在目的和源端都执行。然后需要在启动迁移后,再开启postcopy。其中建议在运行了一轮迁移后执行。
启动postcopy
执行migrate_start_postcopy命令,作用是将某个变量置为true。
Copy hmp_migrate_start_postcopy
qmp_migrate_start_postcopy
atomic_set(&s->start_postcopy, true);
而这个变量设置后,就会在migration_iteration_run判断。当达到条件则会开启postcopy。听着真是一点也不神奇。
postcopy中的交互
神奇的事情这个时候才开始发生 -- postcopy_start()函数隐藏了大部分的秘密。
先来一个简化的代码流程:
Copy postcopy_start(), a little similar with migration_completion()
migrate_set_state(MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_POSTCOPY_ACTIVE)
qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
global_state_store();
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
migration_maybe_pause(ms, &cur_state, MIGRATION_STATUS_POSTCOPY_ACTIVE);
bdrv_inactivate_all();
qemu_savevm_state_complete_precopy(ms->to_dst_file, true, false);
ram_postcopy_send_discard_bitmap()
migration_bitmap_sync(), should be the last sync
postcopy_chunk_hostpages(), Deal with TPS != HPS
postcopy_chunk_hostpages_pass(ms, true, block, pds);
postcopy_chunk_hostpages_pass(ms, false, block, pds);
postcopy_each_ram_send_discard(), tell destination to discard page
postcopy_discard_send_init()
postcopy_send_discard_bm_ram()
postcopy_discard_send_finish()
qemu_savevm_send_postcopy_listen(fb); let destination in Listen State
qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_LISTEN), destination will start postcopy_ram_listen_thread
qemu_savevm_state_complete_precopy(fb, false, false);
qemu_savevm_send_postcopy_run(fb);
qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage), MIG_CMD_PACKAGED
ram_postcopy_migrated_memory_release(ms), release mem.
貌似还是有点长,稍微总结一下:
上面的流程看着已经有点麻烦了,但是这还只是迁移过程中的源端。目的端是如何和源端进行交互的呢?下面用一个小小的流程图解释一下。
Copy Source Destination
migration_thread qemu_loadvm_state_main
loadvm_process_command
ADVISE loadvm_postcopy_handle_advise
check userfaultfd
check page size match
| |
| |
v v
migration_iteration_run
qemu_savevm_state_iterate()
or
postcopy_start
ram_postcopy_send_discard_bitmap
DISCARD loadvm_postcopy_ram_handle_discard
NOHUGEPAGE
clear receivedmap
unmap
| |
| |
v v
qemu_savevm_send_postcopy_listen(fb)
LISTEN loadvm_postcopy_handle_listen
setup userfaultfd
postcopy_ram_fault_thread
ram_block_enable_notify
postcopy_ram_listen_thread
qemu_loadvm_state_main
| |
| |
v v
qemu_savevm_send_packaged(fb)
PACKAGED loadvm_handle_cmd_packaged
qemu_loadvm_state_main
loadvm_process_command
| |
| |
v v
qemu_savevm_send_postcopy_run(fb)
RUN loadvm_postcopy_handle_run
loadvm_postcopy_handle_run_bh
vm_start
return LOADVM_QUIT
目的和源端的交互是通过几个命令来实现的。
PACKAGED : 这个命令有点意思,它将一坨东西打包发送
当目的端收到RUN命令,主线程就会拉起虚拟机执行了。而在LISTEN命令下开启的postcopy_ram_listen_thread就会肩负其接收后续内存的责任。
好玩的是在目的端还用了一个超级简化的状态机来记录postcopy过程的状态变化:incoming_postcopy_state(PostcopyState)
Copy +---------------------------+
|POSTCOPY_INCOMING_NONE |
|POSTCOPY_INCOMING_ADVISE | ----------------+
|POSTCOPY_INCOMING_DISCARD | |
|POSTCOPY_INCOMING_LISTENING| --+ +-- advise
| | +-- running |
|POSTCOPY_INCOMING_RUNNING | --+-------------+
|POSTCOPY_INCOMING_END |
+---------------------------+
这个状态迁移实在太简单了,变化如下:
Copy NONE
|
| MIG_CMD_POSTCOPY_ADVISE
|
ADVISE
| \
| \ MIG_CMD_POSTCOPY_RAM_DISCARD
| \
| DISCARD
| /
| / MIG_CMD_POSTCOPY_LISTEN
| /
LISTEN
|
| MIG_CMD_POSTCOPY_RUN
|
RUNNING
|
|
|
END
userfaultfd
重点在上面都描述完了,这里只提一下这里利用到的一个内核特性userfaultfd。
也即是我们需要通过这个接口来:
好了,基本讲完了,以后想到新的再来补充。