APICV
在qemu/kernel混合模拟的基础上,为了进一步优化中断模拟,硬件提出了APICV。
在上面一小节混合模拟的流程中,其实也会涉及到apicv的代码,但没有仔细考察其作用。现在我们把它拿出来,仔细考察。
硬件知识
既然是硬件的辅助功能,那自然是先学习下硬件都提供了些什么。这部分的内容主要在SDM Chapter 29 APIC VIRTUALIZATION AND VIRTUAL INTERRUPTS。
相关的VM-execution controls
24.6.8 Controls for APIC Virtualization
APIC-access address (64 bits). This field contains the physical address of the 4-KByte APIC-access page
Virtual-APIC address (64 bits). This field contains the physical address of the 4-KByte virtual-APIC page
Certain VM-execution controls enable the processor to virtualize certain accesses to the APIC-access page without a VM exit. In general, this virtualization causes these accesses to be made to the virtual-APIC page instead of the APIC-access page.
TPR threshold (32 bits). Bits 3:0 of this field determine the threshold below which bits 7:4 of VTPR (see Section 29.1.1) cannot fall.
EOI-exit bitmap (4 fields; 64 bits each). These fields are supported only on processors that support the 1- setting of the “virtual-interrupt delivery” VM-execution control. They are used to determine which virtualized writes to the APIC’s EOI register cause VM exits
Posted-interrupt notification vector (16 bits). Its low 8 bits contain the interrupt vector that is used to notify a logical processor that virtual interrupts have been posted. See Section 29.6 for more information on the use of this field.
Posted-interrupt descriptor address (64 bits).
24.4.2 Guest Non-Register State
Guest interrupt status (16 bits) This field is supported only on processors that support the 1-setting of the “virtual-interrupt delivery” VM-execution control.
Requesting virtual interrupt (RVI)(low byte)
Servicing virtual interrupt (SVI)(high byte)
29 APIC VIRTUALIZATION AND VIRTUAL INTERRUPTS
The following are the VM-execution controls relevant to APIC virtualization and virtual interrupts (see Section 24.6 for information about the locations of these controls):
Virtual-interrupt delivery. This controls enables the evaluation and delivery of pending virtual interrupts (Section 29.2). It also enables the emulation of writes (memory-mapped or MSR-based, as enabled) to the APIC registers that control interrupt prioritization.
Use TPR shadow. This control enables emulation of accesses to the APIC’s task-priority register (TPR) via CR8 (Section 29.3) and, if enabled, via the memory-mapped or MSR-based interfaces.
Virtualize APIC accesses. This control enables virtualization of memory-mapped accesses to the APIC (Section 29.4) by causing VM exits on accesses to a VMM-specified APIC-access page. Some of the other controls, if set, may cause some of these accesses to be emulated rather than causing VM exits.
Virtualize x2APIC mode. This control enables virtualization of MSR-based accesses to the APIC (Section 29.5).
APIC-register virtualization. This control allows memory-mapped and MSR-based reads of most APIC registers (as enabled) by satisfying them from the virtual-APIC page. It directs memory-mapped writes to the APIC-access page to the virtual-APIC page, following them by VM exits for VMM emulation.
Process posted interrupts. This control allows software to post virtual interrupts in a data structure and send a notification to another logical processor; upon receipt of the notification, the target processor will process the posted interrupts by copying them into the virtual-APIC page (Section 29.6).
Virtual APIC Page
29.1.1 Virtualized APIC Registers
Virtual task-priority register (VTPR): the 32-bit field located at offset 080H on the virtual-APIC page.
Virtual processor-priority register (VPPR): the 32-bit field located at offset 0A0H on the virtual-APIC page.
Virtual end-of-interrupt register (VEOI): the 32-bit field located at offset 0B0H on the virtual-APIC page.
Virtual interrupt-service register (VISR)
Virtual interrupt-request register (VIRR)
Virtual interrupt-command register (VICR_LO): the 32-bit field located at offset 300H on the virtual-APIC page
Virtual interrupt-command register (VICR_HI): the 32-bit field located at offset 310H on the virtual-APIC page.
图解相关寄存器
vmcs
+----------------------------------+
|guest state area |
| +------------------------------+
| |guest non-register state |
| | +--------------------------+
| | |Guest interrupt status |
| | | +----------------------+
| | | |Requesting virtual |
| | | | interrupt (RVI). |
| | | +----------------------+
| | | |Servicing virtual |
| | | | interrupt (RVI). |
| | | | |
| | +---+----------------------+
| | |
| +------------------------------+
| |
|vm-execution control |
| +------------------------------+
| |APIC-access address |
| | |
| | | 4K Virtual-APIC page
| |Virtual-APIC address ----|-------->+-------------------------+
| | | 080H|Virtual task-priority |
| | | | register (VTPR) |
| | | 0A0H|Vrtl processor-priority |
| | | | register (VPPR) |
| | | 0B0H|Virtual end-of-interrupt |
| | | | register (VEOI) |
| | | |Virtual interrupt-service|
| | | | register (VISR) |
| | | |Virtual interrupt-request|
| | | | register (VIRR) |
| | | 300H|Virtual interrupt-command|
| | | | register(VICR_LO)|
| | | 310H|Virtual interrupt-command|
| | | | register(VICR_HO)|
| | | | |
| | | +-------------------------+
| | |
| | |
| |TPR threshold |
| |EOI-exit bitmap |
| |Posted-interrupt notification |
| | vector |
| | |
| | | 64 byte descriptor
| | | 511 255 0
| |Posted-interrupt descriptor |--->+----------------+----------------+
| | address | | | |
| | | | | |
| | | +----------------+----------------+
| |Pin-Based VM-Execution Ctrls |
| | +-------------------------+
| | |Process posted interrupts|
| | | |
| | +-------------------------+
| | |
| |Primary Processor-Based |
| | VM-Execution Controls |
| | +-------------------------+
| | |Interrupt window exiting |
| | |Use TPR shadow |
| | | |
| | +-------------------------+
| | |
| |Secondary Processor-Based |
| | VM-Execution Controls |
| | +-------------------------+
| | |Virtualize APIC access |
| | |Virtualize x2APIC mode |
| | |APIC-register virtual |
| | |Virtual-intr delivery |
| | | |
| | | |
| | +-------------------------+
| | |
| +------------------------------+
| |
+----------------------------------+
虚拟APIC状态
29.1 VIRTUAL APIC STATE
硬件虚拟化通过Virtualize APIC accesses来指定一块4k内存,用于模拟APIC寄存器的访问和管理中断。
如果没有理解错的话,这个页面在vmx_vcpu_reset函数中设置。
vmx_vcpu_reset
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, __pa(vcpu->arch.apic->regs));
虚拟中断
29.2 EVALUATION AND DELIVERY OF VIRTUAL INTERRUPTS
虚拟中断包含了中断检测和中断发送。
对virtual-APIC page的操作会触发虚拟中断的检测,如果这个检测得到了一个虚拟中断,则会向guest发送一个中断且不导致guest退出。
虚拟中断检测
一下几种情况将触发虚拟中断的检测:
VM entry(Section 26.3.2.5)
TPR virtualization(Section 29.1.2)
EOI virtualization(Section 29.1.4)
self-IPI virtualization(Section 29.1.5)
posted-interrupt processing(Section 29.6)
检测的伪代码如下:
IF “interrupt-window exiting” is 0 AND
RVI[7:4] > VPPR[7:4] (see Section 29.1.1 for definition of VPPR)
THEN recognize a pending virtual interrupt;
ELSE
do not recognize a pending virtual interrupt;
FI;
虚拟中断发送
虚拟中断发送会改变虚拟机中断状态(RVI/SVI),并且在vmx non-root 模式下发送一个中断,从而不导致虚拟机退出。
中断发送的伪代码如下:
Vector ← RVI; VISR[Vector] ← 1;
SVI ← Vector;
VPPR ← Vector & F0H; VIRR[Vector] ← 0;
IF any bits set in VIRR
THEN RVI ← highest index of bit set in VIRR
ELSE RVI ← 0; FI;
deliver interrupt with Vector through IDT;
cease recognition of any pending virtual interrupt;
看过了硬件提供的能力,我们来看看软件上是如何借助硬件的。
相关软件
检测APICV能力
在kvm代码中首先需要检测硬件是否支持apicv的功能,并作标示。
vmx_init
kvm_init
kvm_arch_hardware_setup
hardware_setup
if (!cpu_has_vmx_apicv()) {
enable_apicv = 0;
kvm_x86_ops->sync_pir_to_irr = NULL;
}
可以看到在启动kvm模块的时候就需要检测apicv是否存在并标示enable_apicv。
那这个cpu_has_vmx_apicv又做了啥?
static inline bool cpu_has_vmx_apicv(void)
{
return cpu_has_vmx_apic_register_virt() &&
cpu_has_vmx_virtual_intr_delivery() &&
cpu_has_vmx_posted_intr();
}
这个就和上文列出了硬件提供的内容匹配了。
何处使用
检测好了硬件的特性,现在就要看在什么地方使用了。当然,我们并没有直接使用enable_apicv这个值,而是把它赋值给了vcpu。
kvm_arch_vcpu_init
vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu);
return enable_apicv;
聪明的朋友一定已经想到,接下来就是找apicv_active这个值会在哪里判断,判断的地方就是使用apicv功能的地方了。
Posted Interrupt
Posted Interrupt作为一个比较重要的功能,我们单独拿出来研究。
概念
Posted-interrupt processing is a feature by which a processor processes the virtual interrupts by recording them as pending on the virtual-APIC page.
If the “external-interrupt exiting” VM-execution control is 1, any unmasked external interrupt causes a VM exit (see Section 25.2). If the “process posted interrupts” VM-execution control is also 1, this behavior is changed and the processor handles an external interrupt as follows.
The local APIC is acknowledged; this provides the processor core with an interrupt vector, called here the physical vector.
If the physical vector equals the posted-interrupt notification vector, the logical processor continues to the next step. Otherwise, a VM exit occurs as it would normally due to an external interrupt; the vector is saved in the VM-exit interruption-information field. 物理中断向量号等于 posted-interrupt notification vector才继续。
The processor clears the outstanding-notification bit in the posted-interrupt descriptor. This is done atomically so as to leave the remainder of the descriptor unmodified (e.g., with a locked AND operation). 清ON位。
The processor writes zero to the EOI register in the local APIC; this dismisses the interrupt with the posted- interrupt notification vector from the local APIC.清EOI。
The logical processor performs a logical-OR of PIR into VIRR and clears PIR. No other agent can read or write a PIR bit (or group of bits) between the time it is read (to determine what to OR into VIRR) and when it is cleared. PIR->VIRR
The logical processor sets RVI to be the maximum of the old value of RVI and the highest index of all bits that were set in PIR; if no bit was set in PIR, RVI is left unmodified. 计算得到RVI
The logical processor evaluates pending virtual interrupts as described in Section 29.2.1.
简单来说就是原来运行是的虚拟机需要退出来处理中断,现在不退出了,宿主机上用一个特殊的中断将真正的中断注入到虚拟机。真正注入到虚拟机的中断号记录在 PIR (Posted Interrupt Requests)。
软件中断处理的代码
在之前的代码分析中我们也看到过,不过没有着重讲解。在发送中断的内核部分中,我们看到
kvm_irq_delivery_to_apic()
kvm_irq_delivery_to_apic_fast()
kvm_vector_to_index()
kvm_get_vcpu()
kvm_apic_set_irq()
__apic_accept_irq()
...
APIC_DM_FIXED
kvm_lapic_set_vector
kvm_lapic_clear_vector
if (vcpu->arch.apicv_active)
kvm_x86_ops->deliver_posted_interrupt(vcpu, vector);
else
kvm_lapic_set_irr
kvm_make_request(KVM_REQ_EVENT, vcpu)
kvm_vcpu_kick()
...
对于APIC_DM_FIXED中断类型,如果apicv_active为真,则会采用Posted interrupt方式。
这个函数的工作在注释中写得很清楚了。我向着重讲的是pi_test_and_set_pir将真正的中断向量号写在了pir中。
/*
* Send interrupt to vcpu via posted interrupt way.
* 1. If target vcpu is running(non-root mode), send posted interrupt
* notification to vcpu and hardware will sync PIR to vIRR atomically.
* 2. If target vcpu isn't running(root mode), kick it to pick up the
* interrupt from PIR in next vmentry.
*/
static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
int r;
...
if (pi_test_and_set_pir(vector, &vmx->pi_desc))
return;
/* If a previous notification has sent the IPI, nothing to do. */
if (pi_test_and_set_on(&vmx->pi_desc))
return;
if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
kvm_vcpu_kick(vcpu);
}
进一步打开kvm_vcpu_trigger_posted_interrupt,发生了什么呢?实际上是向目标vcpu所在的物理cpu上发送了vector为POSTED_INTR_VECTOR的一个中断。当然这个中断就叫Posted Interrupt。
static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
bool nested)
{
#ifdef CONFIG_SMP
int pi_vec = nested ? POSTED_INTR_NESTED_VECTOR : POSTED_INTR_VECTOR;
if (vcpu->mode == IN_GUEST_MODE) {
/*
* The vector of interrupt to be delivered to vcpu had
* been set in PIR before this function.
*
* Following cases will be reached in this block, and
* we always send a notification event in all cases as
* explained below.
*
* Case 1: vcpu keeps in non-root mode. Sending a
* notification event posts the interrupt to vcpu.
*
* Case 2: vcpu exits to root mode and is still
* runnable. PIR will be synced to vIRR before the
* next vcpu entry. Sending a notification event in
* this case has no effect, as vcpu is not in root
* mode.
*
* Case 3: vcpu exits to root mode and is blocked.
* vcpu_block() has already synced PIR to vIRR and
* never blocks vcpu if vIRR is not cleared. Therefore,
* a blocked vcpu here does not wait for any requested
* interrupts in PIR, and sending a notification event
* which has no effect is safe here.
*/
apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
return true;
}
#endif
return false;
}
硬件中断处理的代码
在上面的代码中我们可以看到,如果vcpu在guest状态下才会通过post interrupt发送中断。否则还是走kvm_vcpu_kick。
但是对于硬件中断来说,中断发生时直接进入中断处理函数,而不会去判断vcpu状态。这要怎么处理呢?
暂时我能看到是当vcpu处于block状态时,会更换notification vector。也就是由另一个中断函数来响应这个事件。
vcpu_block()
pre_block = vmx_pre_block
pi_pre_block
new.nv = POSTED_INTR_WAKEUP_VECTOR
kvm_vcpu_block()
post_block = vmx_post_block
pi_post_block = __pi_post_block
new.nv = POSTED_INTR_VECTOR
而这个中断处理函数的内容是:
static void wakeup_handler(void)
{
struct kvm_vcpu *vcpu;
int cpu = smp_processor_id();
spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
blocked_vcpu_list) {
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
if (pi_test_on(pi_desc) == 1)
kvm_vcpu_kick(vcpu);
}
spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
}
暂时还不是特别理解,待我以后好好研究。
Last updated
Was this helpful?