arch/i386/kernel/head.S注解

2020-05-19 13:27:40

'kernel/head.S' Detail Comment

/***************************************************************************
32位启动代码：
编译程序在链接内核vmlinux的时候，将vmlinux起始位置链接到0xC0000000 + 0x100000，这
样所有的符号地址=0xC0000000+符号。在编译后的文件中的偏移地址。因此在页面影射没有启动之前，
如果需要访问符号的物理地址，必须要减去0xC0000000。需要说明几点：

1.内核vmlinux被加载的时候，其存放在物理地址为0x100000处。CPU在执行这些指令的时候，如该
文件中的条指令，CS=0,IP=0x100000，由于该条指令的物理地址依然在0x100000，故CPU顺
序往下执行，而无须程序作任何其他调整；

2.但是，在链接vmlinux的时候，链接程序将该源程序中所有的符号，例如符号empty_zero_page，
在后形成的二进制文件中都替换为固定的值，这个固定的值就是：
0xC0000000 + 0x100000 + 其在后形成的二进制文件中的偏移
这样在没有启动页面映射之前，如果直接访问该地址，由于该地址是在3G + offset，在物理上基本上
是不存在的，所以会出错。为了访问到该符号所标识的数据所在的物理地址，需要减去0xC0000000，这
样便得到了该符号在该映射文件中的偏移值 + 0x100000，由于内核在加载的时候刚好被加载在内存
0x100000处，故该符号在该映射文件中的偏移值+0x100000就是该符号在物理内存中的地址。

在/arch/i386/vmlinux.lds中可以看到相关的设置信息，objdump -D vmlinux命令来获取所
有的符号以及这些符号的地址。
***************************************************************************/

.text
#include <linux/config.h>
#include <linux/threads.h>
#include <linux/linkage.h>
#include <asm/segment.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/desc.h>

#define OLD_CL_MAGIC_ADDR 0x90020
#define OLD_CL_MAGIC 0xA33F
#define OLD_CL_BASE_ADDR 0x90000
#define OLD_CL_OFFSET 0x90022
#define NEW_CL_POINTER 0x228 /* Relative to real mode data */

/*
* References to members of the boot_cpu_data structure.
*/

#define CPU_PARAMS SYMBOL_NAME(boot_cpu_data)
#define X86 CPU_PARAMS+0
#define X86_VENDOR CPU_PARAMS+1
#define X86_MODEL CPU_PARAMS+2
#define X86_MASK CPU_PARAMS+3
#define X86_HARD_MATH CPU_PARAMS+6
#define X86_CPUID CPU_PARAMS+8
#define X86_CAPABILITY CPU_PARAMS+12
#define X86_VENDOR_ID CPU_PARAMS+16

/*
* swapper_pg_dir is the main page directory, address 0x00101000
* On entry, %esi points to the real-mode code as a 32-bit pointer=0x90000.
*/
ENTRY(stext)
ENTRY(_stext)
startup_32:
/*
* Set segments to known values
*/
cld
movl $(__KERNEL_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
#ifdef CONFIG_SMP
......
#endif
/***************************************************************************
#define __PAGE_OFFSET (0xC0000000)

这里是用来初始化pg0以及pg1两个页表,每个页表项的结构:
31 12 11 9 8 7 6 5 4 3 2 1 0
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | | |P| | |P|P|U|R| |
| page-table base | |G|A|D|A|C|W|/|/|P|
| | | |T| | |D|T|S|W| |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

* PAT: 0表示4k，1表示4M
* D: 1表示已经写过，在不需要该页面的时候，应该清除该标志位
* A: 表示已经访问过
* PCD: 如果为1表示关闭缓冲存储器
* PWT: 用于缓冲存储器
* U/S: 0表示系统权限，1表示用户权限
* R/W: 0-只读，1-可写
* P: 0表示相应的页面不再内存中

1、首先获取pg0的物理地址，并存放到edi中；
2、设置eax的值为007，准备将后面每个表项的P,R/W,U/S全部设置为1，即PRESENT+RW+USER；
3、将pg0/pg1中的共2048个表项的base设置为:0,1,2,3,4,.....,2047。这样0-8M的物理地址
空间分别被映射到0,1,2,3,4,...表项中；
***************************************************************************/
movl $pg0-__PAGE_OFFSET,%edi
movl $007,%eax

/***************************************************************************
stosl store EAX at address ES:(E)DI;EDI+=4。stosl指令相当于将eax中的值保存到
ES:(E)DI指向的地址中。
if ($empty_zero_page-__PAGE_OFFSET - %edi == 0)
ZF = 1
也就是数如果两者相等则ZF=1,不相等ZF=0

其实这里是一个循环，将2000H-4000H之间的所有的页表项都设置完毕,填写完毕后，退出循环，继续
往下走，这样，每个页表项的内容分别为:

Base phy Address option
0000*4096 007
0001*4096 007
0002*4096 007
0003*4096 007
0004*4096 007
0005*4096 007
0006*4096 007
...........................
8191*4096 007
***************************************************************************/
2: stosl
add $0x1000,%eax
cmp $empty_zero_page-__PAGE_OFFSET,%edi
jne 2b

/***************************************************************************
Enable paging：

设置页目录所在的物理地址为0x101000/4096，也就是0x101000除以4K(每个页面的大小)后的基地
址，而cr3的其他的标志被顺带初始化成了0。cr3寄存器中存放的就是页目录的基地址。

31 12 11 5 4 3 2 1 0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | |P|P| |
| page-directory base | |C|W| | CR3
| | |D|T| |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

***************************************************************************/

3:
movl $swapper_pg_dir-__PAGE_OFFSET,%eax
movl %eax,%cr3 /* set the page table pointer.. */

/***************************************************************************
设置cr0寄存器的PG0标志位为1，一旦该标志位被设置， cpu便
进入了页地址映射阶段。CR0:
31 0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|P| |
|G| |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
***************************************************************************/
movl %cr0,%eax
orl $0x80000000,%eax
movl %eax,%cr0 /* ..and set paging (PG) bit */

/******************************************************************************
这条指令时Intel 386手册建议的，其起到了丢弃CPU取指令流水线缓冲的作用。

这里为什么用个跳转呢？是因为CPU不只一条指令管线(instruction pipeline)，所以会预先提取指令
(instruction fetching)并译码(decoding)，但同样的指令在保护模式和实模式中的译码结果是不同
的, 所以要将指令管线中在实模式译码的部分清掉，怎么清呢？执行会产生"过程控制权转移"的指令即可，如：
jmp,jxx,call,ret,int,iret 等指令。

到现在为止EIP依然在100000H之上的几兆真实空间之内，我们知道内核中所有的符号都被编址在C0100000H
之上，因此在开启了页面映射的情况下，我们应该将EIP重定位到0c0100000以上，采用如下指令就可以将eip
重定位到0c0100000以上:
movl $1f,%eax
jmp *%eax
$1f值为0xc01000xx！(为什么是0xc01000xx呢？是因为通过在jmp后加个*,gas编译器和cpu知道是绝
对跳, 其实就是当作符号来编译，也就是要跳到0xc01000xx这个地址。从而EIP被重定位到0c0100000以
上。)

而jmp 1f则是相对跳，gas会计算1f跟当前计数器值的差，弄成个相对跳机器码，故eip+这个差值,没有实现
reload eip.
*******************************************************************************/

jmp 1f /* flush the prefetch-queue */
1:
movl $1f,%eax
jmp *%eax /* make sure eip is relocated */
1:
/******************************************************************************
Set up the stack pointer:

这里lss SYMBOL_NAME(stack_start),%esp将SS设置成DS，而SP-->stack_start，从而形成
了8K的系统堆栈，栈顶为init_task_union-->stack的后一个元素的地址:
union task_union {
struct task_struct task;
unsigned long stack[INIT_TASK_SIZE/sizeof(long)];
};

| |
|------------|<---SP init_task_union-->stack[2048](long)
| |
| |
|------------|init_task_union
| |
| |
| |
|____________|<---SS

另外这里SS指向的段描述符和DS的完全一样，而段描述符类型中的ED比特位说明了指定段的扩张方向，而非栈
的操作压入方向。因此虽然DS段中的ED=1，其表示了该段的扩展方向是向上的，但是并不会影响堆栈操作的压
入方向。这里大概也能猜出内核的堆栈空间大概为8K。

这个堆栈也就作为了Linux内核系统的堆栈一直使用下去!!!!!!!!
******************************************************************************/
lss stack_start,%esp

#ifdef CONFIG_SMP
......
#endif CONFIG_SMP

/******************************************************************************
Clear BSS first so that there are no surprises...
No need to cld as DF is already clear from cld above...

把bss对应的空间都通通清0.__bss_start _end都在vmlinux.lds里有，它们的值都比较大(对应的虚
拟地址和物理地址也就比较高)。
******************************************************************************/
xorl %eax,%eax
movl $ SYMBOL_NAME(__bss_start),%edi
movl $ SYMBOL_NAME(_end),%ecx
subl %edi,%ecx
rep /* rep stosb: Fill (E)CX bytes at ES:[(E)DI] with AL.*/
stosb

/******************************************************************************
开始32位系统的建立。我们需要重复做一些16位模式下建立16位系统的类似事情。
******************************************************************************/
call setup_idt /* 建立中断向量表 */
/******************************************************************************
在切换到保护模式之前，好初始化所有该初始化的。
这里将psw中所有值清为0,用于初始化eflags，注意在compressed/head.S中也有这样的初始化代码，这
里为什么要再次初始化一下呢?是因为在这过程中,eflags由于各种计算的原因，很多标识已经不为0了!!!
******************************************************************************/
pushl $0
popfl
/******************************************************************************
Note: %esi still has the pointer to the real-mode data.0x90000

Copy bootup parameters out of the way. First 2kB of_empty_zero_page is for
boot parameters, second 2kB is for the command line.

将0x90000代码中的2k东西拷贝到empty_zero_page开始的2K处,同时将empty_zero_page随后的2K
清零。这里还要解释下这里esi既然为90000H，为什么还可以继续访问呢? 很显然90000H对应的的页目录中
表项的索引为0，而0号页目录表项指向的就是pg0,而pg0中90H表项对应的页地址为90H*4096=90000H,因
此其实就是和物理地址完全对应的，这就是为什么将swapper_pg_dir中0、1两条表项设置成物理地址其实和
线性地址完全一一对应的道理了，从而即使通过前面的实模式地址也可以正确访问到物理地址中的正确数据。
******************************************************************************/
movl $ SYMBOL_NAME(empty_zero_page),%edi
movl $512,%ecx
cld
rep
movsl /* rep movsl: Move (E)CX doublewords from DS:[(E)SI] to ES:[(E)DI].edi++,esi++ */
xorl %eax,%eax
movl $512,%ecx
rep
stosl /* rep stosl:Fill (E)CX doublewords at ES:[(E)DI] with EAX.edi++ */

/******************************************************************************
拷贝命令行!!! arch\i386\setup.c中的如下定义均用于初始化命令行:
#define PARAM ((unsigned char *)empty_zero_page)
#define SCREEN_INFO (*(struct screen_info *) (PARAM+0))
#define EXT_MEM_K (*(unsigned short *) (PARAM+2))
#define ALT_MEM_K (*(unsigned long *) (PARAM+0x1e0))
#define E820_MAP_NR (*(char*) (PARAM+E820NR))
#define E820_MAP ((struct e820entry *) (PARAM+E820MAP))
#define APM_BIOS_INFO (*(struct apm_bios_info *) (PARAM+0x40))
#define DRIVE_INFO (*(struct drive_info_struct *) (PARAM+0x80))
#define SYS_DESC_TABLE (*(struct sys_desc_table_struct*)(PARAM+0xa0))
#define MOUNT_ROOT_RDONLY (*(unsigned short *) (PARAM+0x1F2))
#define RAMDISK_FLAGS (*(unsigned short *) (PARAM+0x1F8))
#define ORIG_ROOT_DEV (*(unsigned short *) (PARAM+0x1FC))
#define AUX_DEVICE_INFO (*(unsigned char *) (PARAM+0x1FF))
#define LOADER_TYPE (*(unsigned char *) (PARAM+0x210))
#define KERNEL_START (*(unsigned long *) (PARAM+0x214))
#define INITRD_START (*(unsigned long *) (PARAM+0x218))
#define INITRD_SIZE (*(unsigned long *) (PARAM+0x21c))
#define COMMAND_LINE ((char *) (PARAM+2048))
#define COMMAND_LINE_SIZE 256
#define RAMDISK_IMAGE_START_MASK 0x07FF
#define RAMDISK_PROMPT_FLAG 0x8000
#define RAMDISK_LOAD_FLAG 0x4000

esi=empty_zero_page+0x228
edi=empty_zero_page+0x800

将从esi= SYMBOL_NAME(empty_zero_page)+0x228开始处的512*4=2K字节
拷贝到edi处，这里会导致2k开始的228字节和empty_zero_page+2048+(2048-228)处重复。
将新的2k command line弄到empty_zero_page+2048开始处*/
******************************************************************************/
movl SYMBOL_NAME(empty_zero_page)+NEW_CL_POINTER,%esi
andl %esi,%esi
jnz 2f /* New command line protocol,如果该处不为0，表示使用了新命令行，则跳转 */
/* JNZ 如果刚刚计算的结果不为0则跳转(Jump short if not zero (ZF=0)) */
cmpw $(OLD_CL_MAGIC),OLD_CL_MAGIC_ADDR /* 判断90020H的值是否为0xA33F(通过页表0找到该地址) */
jne 1f
movzwl OLD_CL_OFFSET,%esi
addl $(OLD_CL_BASE_ADDR),%esi
2:
movl $ SYMBOL_NAME(empty_zero_page)+2048,%edi
movl $512,%ecx
rep
movsl /* rep movsl: Move (E)CX doublewords from DS:[(E)SI] to ES:[(E)DI].edi++,esi++ */
1:
#ifdef CONFIG_SMP
checkCPUtype:
#endif

/******************************************************************************
检查cpu类型，首先将boot_cpu_data.cpuid_level:

struct cpuinfo_x86 {
__u8 x86; /* CPU family */
__u8 x86_vendor; /* CPU vendor */
__u8 x86_model;
__u8 x86_mask;
char wp_works_ok; /* It doesn't on 386's */
char hlt_works_ok; /* Problems on some 486Dx4's and old 386's */
char hard_math;
char rfu;
int cpuid_level; /* Maximum supported CPUID level, -1=no CPUID */
__u32 x86_capability[NCAPINTS];
char x86_vendor_id[16];
char x86_model_id[64];
int x86_cache_size; /* in KB - valid for CPUS which support this call */
int fdiv_bug;
int f00f_bug;
int coma_bug;
unsigned long loops_per_jiffy;
unsigned long *pgd_quick;
unsigned long *pmd_quick;
unsigned long *pte_quick;
unsigned long pgtable_cache_sz;
};
#define CPU_PARAMS SYMBOL_NAME(boot_cpu_data)
#define X86 CPU_PARAMS+0
#define X86_VENDOR CPU_PARAMS+1
#define X86_MODEL CPU_PARAMS+2
#define X86_MASK CPU_PARAMS+3
#define X86_HARD_MATH CPU_PARAMS+6
#define X86_CPUID CPU_PARAMS+8
#define X86_CAPABILITY CPU_PARAMS+12
#define X86_VENDOR_ID CPU_PARAMS+16
*/
******************************************************************************/
movl $-1,X86_CPUID /* -1 for no CPUID initially */

/*
* check if it is 486 or 386.
* XXX - this does a lot of unnecessary setup. Alignment checks don't
* apply at our cpl of 0 and the stack ought to be aligned already, and
* we don't need to preserve eflags.
*/

/* 判断是否是386，先设置boot_cpu_data.x86=3(386) */
movl $3,X86 # at least 386
pushfl # push EFLAGS
popl %eax # get EFLAGS
movl %eax,%ecx # save original EFLAGS
xorl $0x40000,%eax # flip AC bit in EFLAGS
pushl %eax # copy to EFLAGS
popfl # set EFLAGS
pushfl # get new EFLAGS
popl %eax # put it in eax
xorl %ecx,%eax # change in flags
andl $0x40000,%eax # check if AC bit changed
je is386 # 是386则跳转

/* 判断是否是486,设置boot_cpu_data.x86=4(486) */
movl $4,X86 # at least 486
movl %ecx,%eax
xorl $0x200000,%eax # check ID flag
pushl %eax
popfl # if we are on a straight 486DX, SX, or
pushfl # 487SX we can't change it
popl %eax
xorl %ecx,%eax
pushl %ecx # restore original EFLAGS
popfl
andl $0x200000,%eax
je is486 # 是486则跳转

/* get vendor info,如果不是486，则获取供应商ID并保存到boot_cpu_data中 */
xorl %eax,%eax # call CPUID with 0 -> return vendor ID
cpuid
movl %eax,X86_CPUID # save CPUID level
movl %ebx,X86_VENDOR_ID # lo 4 chars
movl %edx,X86_VENDOR_ID+4 # next 4 chars
movl %ecx,X86_VENDOR_ID+8 # last 4 chars

orl %eax,%eax # do we have processor info as well?
je is486

/* 获取CPU类型 */
movl $1,%eax # Use the CPUID instruction to get CPU type
cpuid
movb %al,%cl # save reg for future use
andb $0x0f,%ah # mask processor family
movb %ah,X86
andb $0xf0,%al # mask model
shrb $4,%al
movb %al,X86_MODEL
andb $0x0f,%cl # mask mask revision
movb %cl,X86_MASK
movl %edx,X86_CAPABILITY

is486:
movl %cr0,%eax # 486 or better
andl $0x80000011,%eax # Save PG,PE,ET
orl $0x50022,%eax # set AM, WP, NE and MP
jmp 2f

is386: pushl %ecx # restore original EFLAGS
popfl
movl %cr0,%eax # 386
andl $0x80000011,%eax # Save PG(开启页面映射),PE,ET(开启保护模式)
orl $2,%eax # set MP
2: movl %eax,%cr0
call check_x87
#ifdef CONFIG_SMP
incb ready
#endif

/* 加载段描述符表和中断描述表 */
lgdt gdt_descr
lidt idt_descr

/* 重新加载所有的段寄存器 */
ljmp $(__KERNEL_CS),$1f
1: movl $(__KERNEL_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
#ifdef CONFIG_SMP
......
#else
lss stack_start,%esp # Load processor stack,重新设置堆栈
#endif
xorl %eax,%eax
lldt %ax # LLDT:Load Local Descriptor Table Register,这里ax=0，没有使用ldtr

#Clears the DF flag in the EFLAGS register.
cld # gcc2 wants the direction flag cleared at all times
#ifdef CONFIG_SMP
.......
#endif
# 进入start_kernel内核
call SYMBOL_NAME(start_kernel)

L6:
.......
ret

/******************************************************************************
setup_idt:

sets up a idt with 256 entries pointing to ignore_int, interrupt gates. It
doesn't actually load idt - that can be done only after paging has been enabled
and the kernel moved to PAGE_OFFSET. Interrupts are enabled elsewhere, when we
can be relatively sure everything is ok.

struct desc_struct idt_table[256]
__attribute__((__section__(".data.idt"))) = { {0, 0}, };

31 16 15 13 12 8 5 4 0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | |D | | | |
| offset 31...16 |P|P |0 D 1 1 0|0 0 0| | 高4字节
| | |L | | | |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
31 16 0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | |
| segment selector | offset 15...0 | 低4字节
| | |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
这里 D: 1表示32位，0表示16位
P: 1表示在内存
DPL: 描述符优先级

可见下面的代码将256个中断描述符表项的段描述符设置成CS，将中断处理程序的地址设置为ignore_int，
而将相关的标志位设置成0x8E00,也就是P=1,DPL=0,D=1.

每个中断描述符的中断处理函数仅仅打印出"Unknown interrupt".
注意：每个中断描述符表项中的偏移地址存放的均为逻辑地址
******************************************************************************/

/******************************************************************************
1:准备好用于初始化的中断描述符表项的初始值,将默认中断处理程序ignore_int的32位逻辑地址存放到
edx中,edx为即将设置的每条中断描述符表项的高32字节
******************************************************************************/
setup_idt:
lea ignore_int,%edx /* lea mStore,r32 effective address for m in register r32. */

/* 将当前中断对应的段描述符存到eax的高16字节中，eax为即将设置的每条中断描述符表项的低32字节 */
movl $(__KERNEL_CS << 16),%eax

/******************************************************************************
将ignore_int的逻辑地址低16为保存到eax的低16位中去，到这里为止我们已经将每条中断描述符表项
的低32字节(eax)设置完毕
******************************************************************************/
movw %dx,%ax /* selector = 0x0010 = cs */

/******************************************************************************
设置好每条中断描述符表项的高32字节中的低16字节，也就是每条中断描述符表项权限，到这里为止,我们
已经将每条中断描述符表项的低32字节(eax)设置完毕
******************************************************************************/
movw $0x8E00,%dx /* interrupt gate - dpl=0, present */

/******************************************************************************
2.准备好我们将要操作的中断描述符表项
******************************************************************************/
lea SYMBOL_NAME(idt_table),%edi
mov $256,%ecx
/******************************************************************************
3.在次循环中我们将所有的共256条中断描述符表项统统初始化为一样。
附注:页面地址--->物理地址的过程:
逻辑地址addr-->(add>>20)获取页目录中对应表项-->((add<<10)>>22)获取页表中的表项
--->add & 0xFFF获取在页面中偏移-->线性地址+依据指令属性获取对应的段寄存器-->
在GDTR寻找到该段对应的表项--->获取其中的基地址+线性地址--->物理地址
******************************************************************************/
rp_sidt:
movl %eax,(%edi)
movl %edx,4(%edi)
addl $8,%edi /* 下一个中断描述符表项 */
dec %ecx
jne rp_sidt
ret

/**********************************************************************
init_task_union为task_union联合体变量，注意这里是联合体:
union task_union {
struct task_struct task;
unsigned long stack[INIT_TASK_SIZE/sizeof(long)];
};
***********************************************************************/
ENTRY(stack_start)
.long SYMBOL_NAME(init_task_union)+8192
.long __KERNEL_DS

/* This is the default interrupt "handler" :-) */
int_msg:
.asciz "Unknown interrupt\n"
ALIGN
ignore_int:
cld
pushl %eax
pushl %ecx
pushl %edx
pushl %es
pushl %ds
movl $(__KERNEL_DS),%eax
movl %eax,%ds
movl %eax,%es
pushl $int_msg
call SYMBOL_NAME(printk)
popl %eax
popl %ds
popl %es
popl %edx
popl %ecx
popl %eax
iret

/**********************************************************************
The interrupt descriptor table has room for 256 idt's,the global
descriptor table is dependent on the number of tasks we can have..
***********************************************************************/
#define IDT_ENTRIES 256
#define GDT_ENTRIES (__TSS(NR_CPUS)) #=16

.globl SYMBOL_NAME(idt)
.globl SYMBOL_NAME(gdt)

ALIGN
.word 0
idt_descr:
.word IDT_ENTRIES*8-1 # idt contains 256 entries
SYMBOL_NAME(idt):
.long SYMBOL_NAME(idt_table)

.word 0

/***********************************************************************
The layout of the GDT under Linux:

0 - null
1 - not used
2 - kernel code segment
3 - kernel data segment
4 - user code segment <-- new cacheline
5 - user data segment
6 - not used
7 - not used
8 - APM BIOS support <-- new cacheline
9 - APM BIOS support
10 - APM BIOS support
11 - APM BIOS support

The TSS+LDT descriptors are spread out a bit so that every CPU
has an exclusive cacheline for the per-CPU TSS and LDT:

12 - CPU#0 TSS <-- new cacheline
13 - CPU#0 LDT
14 - not used
15 - not used
**********************************************************************/
gdt_descr:
.word GDT_ENTRIES*8-1
SYMBOL_NAME(gdt):
.long SYMBOL_NAME(gdt_table)

/***********************************************************************
This is initialized to create an identity-mapping at 0-8M (for bootup
purposes) and another mapping of the 0-8M area at virtual address
PAGE_OFFSET.

每个页目录表项的结构:

31 12 11 9 8 7 6 5 4 3 2 1 0
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | | |P| | |P|P|U|R| |
| page-table base | |G|S| |A|C|W|/|/|P|
| | | | | | |D|T|S|W| |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

PS: 0表示4k，1表示4M
A: 表示已经访问过
PCD: 如果为1表示关闭缓冲存储器
PWT: 用于缓冲存储器
U/S: 0表示系统权限，1表示用户权限
R/W: 0-只读，1-可写
P: 0表示相应的页面不再内存中
page-table base: 页表的物理地址/4096

.long 0x00102007 表示个页表的物理地址在0x00102000，也就是指向下面的pg0
.long 0x00102007 表示第二个页表的物理地址在0x00103000，也就是指向下面的pg1
这两个表项共同指定了8M的空间。

.fill BOOT_USER_PGD_PTRS-2,4,0表示连续设置了766个值全部为0的表项这里共设置了
1024个表项，因为逻辑地址的高10位表示页表目录的偏移，而2^10=1024，故表示可寻址大
索引为1024，也就页表是只要有1024项就可以了。

这里前3个表项表示开始3G的用户空间，后3个表项表示后续1G的内核空间,他们的前8M空间都由
第1.2.4.5来表示。
**********************************************************************/
.org 0x1000
ENTRY(swapper_pg_dir)
.long 0x00102007 # 0
.long 0x00103007 # 1
.fill BOOT_USER_PGD_PTRS-2,4,0 # area 766

/***********************************************************************
由于内核位于C0000000H以上的内存单元，故其逻辑地址中的高10位，也就是其在页目录表中对应
的索引号，很显然为768及768以后的表项，我们知道每个页目录表项可以代表4M空间，这样2个页目录
表项就可以代表8M空间了每个页目录表项的作用其实就是指定了对应页表的物理地址
**********************************************************************/
.long 0x00102007
.long 0x00103007

/* default: 254 entries */
.fill BOOT_KERNEL_PGD_PTRS-2,4,0 /* 1024-768-2个剩余的暂时不用的页目录表项 */

/***********************************************************************
The page tables are initialized to only 8MB here - the final page
tables are set up later depending on memory size.

伪指令org用来规定目标程序存放单元的偏移量。比如，如果在源程序的条指令前用了如下指
令： org 200h。那么，汇编程序会把指令指针的ip的值设成200h,即目标程序的个字节放
在200h处，后面的内容则顺序存放，除非遇上另一个org 语句。

另外需要注意的是，这里的200h是相对于当前section的偏移。例如这里的偏移就是0x2000，
section就是.text，而.text所在的地址就是0x100000，故这里其实就是地址0x102000。
**********************************************************************/
.org 0x2000
ENTRY(pg0)

.org 0x3000
ENTRY(pg1)

/***********************************************************************
empty_zero_page must immediately follow the page tables ! (The
initialization loop counts until empty_zero_page)
**********************************************************************/

.org 0x4000
ENTRY(empty_zero_page)

.org 0x5000
ENTRY(empty_bad_page)

.org 0x6000
ENTRY(empty_bad_pte_table)

#if CONFIG_X86_PAE

.org 0x7000
ENTRY(empty_bad_pmd_table)

.org 0x8000

#else

.org 0x7000

#endif

/***********************************************************************
This starts the data section. Note that the above is all in the
text section because it has alignment requirements that we cannot
fulfill any other way.
**********************************************************************/
.data

ALIGN
/***********************************************************************
This contains typically 140 quadwords, depending on NR_CPUS.

NOTE! Make sure the gdt descriptor in head.S matches this if you
change anything.
**********************************************************************/
ENTRY(gdt_table)
.quad 0x0000000000000000 /* NULL descriptor */
.quad 0x0000000000000000 /* not used */
.quad 0x00cf9a000000ffff /* 0x10 kernel 4GB code at 0x00000000 */
.quad 0x00cf92000000ffff /* 0x18 kernel 4GB data at 0x00000000 */
.quad 0x00cffa000000ffff /* 0x23 user 4GB code at 0x00000000 */
.quad 0x00cff2000000ffff /* 0x2b user 4GB data at 0x00000000 */
.quad 0x0000000000000000 /* not used */
.quad 0x0000000000000000 /* not used */
/*
* The APM segments have byte granularity and their bases
* and limits are set at run time.
*/
.quad 0x0040920000000000 /* 0x40 APM set up for bad BIOS's */
.quad 0x00409a0000000000 /* 0x48 APM CS code */
.quad 0x00009a0000000000 /* 0x50 APM CS 16 code (16 bit) */
.quad 0x0040920000000000 /* 0x58 APM DS data */
.fill NR_CPUS*4,8,0 /* space for TSS's and LDT's */

/*
* This is to aid debugging, the various locking macros will be putting
* code fragments here. When an oops occurs we'd rather know that it's
* inside the .text.lock section rather than as some offset from whatever
* function happens to be last in the .text segment.
*/
.section .text.lock
ENTRY(stext_lock)

文章来源CU社区：前段时间arch/i386/kernel/head.S注解

分享好友

分享这个小栈给你的朋友们，一起进步吧。

内核源码

创建时间：2020-05-18 13:36:55

内核源码精华帖内容汇总

展开

订阅须知

• 所有用户可根据关注领域订阅专区或所有专区

• 付费订阅：虚拟交易，一经交易不退款；若特殊情况，可3日内客服咨询

• 专区发布评论属默认订阅所评论专区（除付费小栈外）

技术专家

查看更多

飘絮絮絮丶
专家