From: LINK
Outline
- PC architecture
- x86 instruction set
- gcc calling conventions 【调用约定】
- PC emulation
PC architecture
数据寄存器:AX, BX, CX, DX
地址寄存器:SP, DP, SI, BI
指令寄存器:IP
16位和32位的切换
A full PC has:
- an x86 CPU with registers, execution unit, and memory management
- CPU chip pins include address and data signals
- memory
- disk
- keyboard (Input)
- display (Output)
- other resources: BIOS ROM, clock, ...
We will start with the original 16-bit 8086 CPU (1978)
CPU runs instructions:
for(;;){
run next instruction
}
Needs work space: registers
- four 16-bit data registers: AX, BX, CX, DX
- each in two 8-bit halves, e.g. AH and AL
- very fast, very few
More work space: memory
- CPU sends out address on address lines (wires, one bit per wire)
- Data comes back on data lines
- or data is written to data lines
Add address registers: pointers into memory
- SP - stack pointer
- BP - frame base pointer 【结合 From Nand to Tetris 课程理解】
- SI - source index
- DI - destination index
Instructions are in memory too!
- IP - instruction pointer (PC on PDP-11, everything else)
- increment after running each instruction
- can be modified by CALL, RET, JMP, conditional jumps
- Want conditional jumps 【跳转语句】
- FLAGS - various condition codes
- whether last arithmetic operation overflowed
- ... was positive/negative
- ... was [not] zero
- ... carry/borrow on add/subtract
- ... etc.
- whether interrupts are enabled
- direction of data copy instructions
- JP, JN, J[N]Z, J[N]C, J[N]O ...
Still not interesting - need I/O to interact with outside world
Original PC architecture: use dedicated I/O space
Works same as memory accesses but set I/O signal
Only 1024 I/O addresses
Accessed with special instructions (IN, OUT)
Example: write a byte to line printer:
#define DATA_PORT 0x378
#define STATUS_PORT 0x379
#define BUSY 0x80
#define CONTROL_PORT 0x37A
#define STROBE 0x01
void lpt_putc(int c) {
/* wait for printer to consume previous byte */
while((inb(STATUS_PORT) & BUSY) == 0)
;
/* put the byte on the parallel lines */
outb(DATA_PORT, c);
/* tell the printer to look at the data */
outb(CONTROL_PORT, STROBE);
outb(CONTROL_PORT, 0);
}
Memory-Mapped I/O
- Use normal physical memory addresses
- Gets around limited size of I/O address space
- No need for special instructions
- System controller routes to appropriate device
- Works like ``magic'' memory:
- Addressed and accessed like memory, but ...
- ... does not behave like memory!
- Reads and writes can have ``side effects''
- Read results can change due to external events
将输入输出设备映射到指定的内存区域,再经由特定指令进行访问。
What if we want to use more than 2^16 bytes of memory?
16 是因为该系统为 16bit
8086 has 20-bit physical addresses, can have 1 Meg RAM
the extra four bits usually come from a 16-bit "segment register":
- CS: code segment, for fetches via IP
- SS: stack segment, for load/store via SP and BP
- DS: data segment, for load/store via other registers
- ES: another data segment, destination for string operations
virtual to physical translation: pa = va + seg*16
- e.g. set CS = 4096 to execute starting at 65536
Lab1 中便采用了这个方法,其中 *16
恰好是16进制下的左移一位
tricky: can't use the 16-bit address of a stack variable as a pointer
!> 堆栈变量的16位地址不能作为指针
a far pointer includes full segment: offset (16 + 16 bits)
远指针?偏移量?
tricky: pointer arithmetic and array indexing across segment boundaries
跨段边界的指针算术和数组索引
But 8086's 16-bit addresses and data were still painfully small
80386 added support for 32-bit data and addresses (1985)
boots in 16-bit mode, boot.S switches to 32-bit mode !>
registers are 32 bits wide, called EAX rather than AX
operands and addresses that were 16-bit became 32-bit in 32-bit mode, e.g. ADD does 32-bit arithmetic
prefixes 0x66/0x67 toggle between 16-bit and 32-bit operands and addresses: in 32-bit mode, MOVW is expressed as 0x66 MOVW
前缀0x66 / 0x67在16位和32位操作数和地址之间切换:在32位模式下,MOVW表示为0x66 MOVW
the .code32 in boot.S tells assembler to generate 0x66 for e.g. MOVW
80386 also changed segments and added paged memory...
Example instruction encoding
b8 cd ab 16-bit CPU, AX <- 0xabcd
b8 34 12 cd ab 32-bit CPU, EAX <- 0xabcd1234
66 b8 cd ab 32-bit CPU, AX <- 0xabcd
x86 Physical Memory Map 物理内存映射
- The physical address space mostly looks like ordinary RAM
- Except some low-memory addresses actually refer to other things
- Writes to VGA memory appear on the screen
- Reset or power-on jumps to ROM at 0xfffffff0 (so must be ROM at top...)
+------------------+ <- 0xFFFFFFFF (4GB)
| 32-bit |
| memory mapped |
| devices |
| |
/\/\/\/\/\/\/\/\/\/\
/\/\/\/\/\/\/\/\/\/\
| |
| Unused |
| |
+------------------+ <- depends on amount of RAM
| |
| |
| Extended Memory |
| |
| |
+------------------+ <- 0x00100000 (1MB)
| BIOS ROM |
+------------------+ <- 0x000F0000 (960KB)
| 16-bit devices, |
| expansion ROMs |
+------------------+ <- 0x000C0000 (768KB)
| VGA Display | <- Writes to VGA memory appear on the screen
+------------------+ <- 0x000A0000 (640KB)
| |
| Low Memory | <- Refer to other things
| |
+------------------+ <- 0x00000000
x86 Instruction Set
Intel syntax: op dst, src
(Intel manuals!)
AT&T (gcc/gas) syntax: op src, dst
(labs, xv6)
- uses
b
,w
,l
suffix on instructions to specify size of operands
Operands are registers, constant, memory via register, memory via constant
Examples:AT&T syntax"C"-ish equivalentmovl %eax, %edxedx = eax;_register mode_movl $0x123, %edxedx = 0x123;_immediate_movl 0x123, %edxedx = (int32_t)0x123;_direct_movl (%ebx), %edxedx = (int32_t)ebx;_indirect_movl 4(%ebx), %edxedx = (int32_t)(ebx+4);displaced
Instruction classes
- data movement: MOV, PUSH, POP, ...
- arithmetic: TEST, SHL, ADD, AND, ...
- i/o: IN, OUT, ...
- control: JMP, JZ, JNZ, CALL, RET
- string: REP MOVSB, ...
- system: IRET, INT
Intel architecture manual Volume 2 is the reference
gcc x86 calling conventions
x86 dictates that stack grows down:Example instructionWhat it doespushl
%eaxsubl $4, %esp
movl %eax, (%esp)popl
%eaxmovl (%esp), %eax
addl $4, %espcall
0x12345pushl %eip (*)
movl $0x12345, %eip (*) retpopl %eip (*)
(*) Not real instructions
GCC dictates how the stack is used. Contract between caller and callee on x86:
- at entry to a function (i.e. just after call):
%eip
points at first instruction of function%esp+4
points at first argument%esp
points at return address- after ret instruction:
%eip
contains return address%esp
points at arguments pushed by caller- called function may have trashed arguments
%eax
(and%edx
, if return type is 64-bit) contains return value (or trash if function isvoid
)%eax
,%edx
(above), and%ecx
may be trashed%ebp
,%ebx
,%esi
,%edi
must contain contents from time ofcall
- Terminology: 【术语】
%eax
,%ecx
,%edx
are "caller save" registers%ebp
,%ebx
,%esi
,%edi
are "callee save" registers
Functions can do anything that doesn't violate contract. By convention, GCC does more:
each function has a stack frame marked by %ebp
, %esp
+------------+ |
| arg 2 | \
+------------+ >- previous function's stack frame
| arg 1 | /
+------------+ |
| ret %eip | /
+============+
| saved %ebp | \
%ebp-> +------------+ |
| | |
| local | \
| variables, | >- current function's stack frame
| etc. | /
| | |
| | |
%esp-> +------------+ /
结合 From Nand to Tetris
笔记理解,这里 %ebp
和 %esp
制定了方法栈的大小,记录该部分的栈顶和栈底,其大小由参数个数决定。
%esp
can move to make stack frame bigger, smaller
%ebp
points at saved %ebp
from previous function, chain to walk stack
新方法入栈时,将入栈前的栈顶记录为%ebp
, 在参数入栈和出栈时修改 %esp
function prologue: 【方法入栈】
pushl %ebp
movl %esp, %ebp
or
enter $0, $0
enter usually not used: 4 bytes vs 3 for pushl+movl
, not on hardware fast-path anymore
function epilogue can easily find return EIP on stack: 【方法出栈】
movl %ebp, %esp
popl %ebp
or
leave
leave used often because it's 1 byte, vs 3 for movl+popl
Big example:
C code
|
|
assembler
_main:
prologue
pushl %ebp
movl %esp, %ebp
body
pushl $8
call _f
addl $1, %eax
epilogue
movl %ebp, %esp
popl %ebp
ret
_f:
prologue
pushl %ebp
movl %esp, %ebp
body
pushl 8(%esp)
call _g
epilogue
movl %ebp, %esp
popl %ebp
ret
_g:
prologue
pushl %ebp
movl %esp, %ebp
save %ebx
pushl %ebx
body
movl 8(%ebp), %ebx
addl $3, %ebx
movl %ebx, %eax
restore %ebx
popl %ebx
epilogue
movl %ebp, %esp
popl %ebp
ret
可以看到对于每个函数都有,记录初始栈顶 movl %esp, %ebp
, 加载参数,(调用函数)出栈,返回
Super-small
_g:
_g:
movl 4(%esp), %eax
addl $3, %eax
ret
Compiling, linking, loading:
Preprocessor takes C source code (ASCII text), expands #include etc, produces C source code
预处理:载入头文件
Compiler takes C source code (ASCII text), produces assembly language (also ASCII text)
编译:源代码 => 汇编
Assembler takes assembly language (ASCII text), produces .o
file (binary, machine-readable!)
汇编:汇编 => 机器码
Linker takes multiple '.o
's, produces a single program image (binary)
链接:多个机器码文件 => 单个二进制程序镜像
Loader loads the program image into memory at run-time and starts it executing
装载:程序镜像放入内存开始执行
PC emulation 【模拟器(如何工作)】
The Bochs emulator works by
- doing exactly what a real PC would do,
- only implemented in software rather than hardware!
Runs as a normal process in a "host" operating system (e.g., Linux)
Uses normal process storage to hold emulated hardware state: e.g.,
Stores emulated CPU registers in global variables 【模拟寄存器】
int32_t regs[8];
#define REG_EAX 1;
#define REG_EBX 2;
#define REG_ECX 3;
...
int32_t eip;
int16_t segregs[4];
...
Stores emulated physical memory in Boch's memory 【模拟内存】
char mem[256*1024*1024];
256 bytes * 1024 * 1024 = 256 MB
Execute instructions by simulating them in a loop: 【模拟指令执行】
|
|
Simulate PC's physical memory map by decoding emulated "physical" addresses just like a PC would:
【模拟物理内存映射】【内存读写】
|
|
Simulate I/O devices, etc., by detecting accesses to "special" memory and I/O space and emulating the correct behavior: e.g., 【仿真输入输出设备】
- Reads/writes to emulated hard disk transformed into reads/writes of a file on the host system
- Writes to emulated VGA display hardware transformed into drawing into an X window
- Reads from emulated PC keyboard transformed into reads from X input event queue