click below
click below
Normal Size Small Size show me how
CS 3375
Question | Answer |
---|---|
The performance of a machine is determined by instruction count (IC), clock cycles per instruction (CPI), and clock cycle time. Here, IC is determined by compiler and [ ? ]. | Instruction Set Architecture |
In a single cycle data path, every instruction begins on one clock edge and completes execution on the next clock edge. So the CPI is [ ? ] | CPI = 1 |
In 6 by 1 multiplexer, how many selector bits do we need? | 3 bits |
How to determine the length of single clock cycle in a single-cycle datapath? | The clock cycle is determined by the longest possible path (or the slowest operation, lw) in the machine |
Pipeline rate is limited by [slowest? or fastest?] pipeline stage. | Slowest |
List five pipeline stages in a single-cycle datapath. | IF(Instruction Fetch), ID(Instruction Decode), EX(Execute / Address Calculation), MEM(Memory Access (read / write), and WB (Write Back (results into register file) |
Pipelining improves performance by increasing instruction throughput and decreasing the execution time of an individual instruction. | False |
Two's complement operation, | Subtraction using addition of negative numbers |
Arithmetic function: | - Addition & subtraction - Need to know number representations - Integer and floating point arithmetic |
Logical operations: | - and - or - not |
How do you know there is no overflow? | - When adding a positive and a negative number - When signs are the same for subtraction |
How do we know when there is an overflow | Overflow occurs when the value affects the sign: |
Examples of overflow: | -When adding two positives yields a negative -Adding two negatives gives a positive, or -Subtract a negative from a positive and get a negative, or -Subtract a positive from a negative and get a positive, |
MIPS commands that don't detect overflow | addu, addiu, subu |
lb command | Loads a byte and stores the sign-extended version in a word |
lbu command | Loads a byte and stores it in a word |
Drawback of integer binary multiplication | - Half of 64-bit multiplicand register are zeros - Half of 64 bit adder is adding zeros |
Solution to the drawback of integer binary multiplication | - Shift product right instead of multiplicand left - Only left half of product register added to multiplicand |
Steps in hardware integer division: | - Shift the dividend left one position - Subtract the divisor from the left half of the dividend - if (result positive), Shift left a 1 into the quotient; - else, Shift left a 0 into the quotient; - Once the result is positive, Repeat the process for |
Signed magnitude representation | (-1)^S x F x B^E, S = sign (either 0 (pos) or 1 (neg)) F = fraction (also called the mantissa) B = base (implied) (Ex, 2 or 10) E = exponent |
Overflow in Signed magnitude representation | The exponent is too large to be represented in the exponent field |
Underflow in Signed magnitude representation | The negative exponent is too large to fit in the exponent field |
IEEE 754 floating-point standard: normalized numbers in binary format | (-1)^s x (1 + Fraction) x 2^E |
normalized numbers in binary format: single precision | The significand is 24 bits long (1 + 23-bit fraction) |
normalized numbers in binary format: double precision | The significand is 53 bits long (1 + 52-bit fraction) |
normalized numbers in binary format: single precision exponential bias | Exponential Bias = 127 |
normalized numbers in binary format: double precision exponential bias | Exponential Bias = 1023 |
How is the Instruction count determined? | Determined by ISA and compiler |
Instruction formats: R-type | Instructions like registers and words of data, are 32 bits long Ex: add $t0, $s1, $s2 |
Instruction formats: I-type | Load/Store instruction format Ex: lw $t1, 24($s2) |
Instruction formats: J-type | Jump formats Ex: jal 0x00400030 |
Instruction implementation: PC goes to | instruction memory, fetch instruction |
Instruction implementation:Register numbers goes to | register file, read registers |
In R - type instructions format, use ALU to calculate | Arithmetic result |
In I - type instructions format, use ALU to calculate | Memory address for load/store |
In J- type instructions format, use ALU to calculate | Branch target address |
The functional units in the MIPS: Combinational | - Elements that operate on data values - Output is a function of input |
The functional units in the MIPS: Sequential | - Elements that contain state - Output depends on both inputs and the contents of the internal state |
Logic Design Conventions: Sequential Elements: Register | - stores data in a circuit - Uses a clock signal to determine when to update the stored value - Edge-triggered: update when Clk changes from 0 to 1 |
Logic Design Conventions: Sequential Elements: Register with write control | - Only update on clock edge when write control input is 1 - Used when stored value is required later |
Clocking Methodology | - When signals can be read & when they can be written - Edge-triggered clocking – update only on a clock edge |
instruction memory | stores the instructions of a program |
program counter (PC) | keeps the address of the instruction |
adder | increment the PC or jump to the address of the next instruction |
To read one word from a register? | Need an input (register number) and one output (data) |
To write one word from a register? | Need two inputs (register number & data) |
Single cycle datapath: The implementation attempts to execute all instructions in 1 clock cycle | CPI is always 1 |
Any element needed more than once must be | duplicated |
No datapath resource can be used more than once per instruction? | A separate instruction memory is required |
Multiplexor (Data Selector): | - Allow multiple connections to the input of an element - A control signal selects among the inputs |
The key differences between R-type and memory-reference (lw, sw) are: | - The second input to the ALU is either a register (R-type) or the sign-extended half of the instruction (memory) - The value stored into a destination register comes from either the ALU (R-type) or the memory (lw? or sw?) |
Single Cycle Datapath & Control Unit equation | CPU = Datapath + Control |
Single-cycle design in a Single Cycle Datapath & Control Unit: | - Instruction takes exactly one clock cycle (CPI = 1) - Datapath units used only once per cycle - Writable state updated at end of cycle |
What must be “controlled” in a Single Cycle Datapath & Control Unit? | - Multiplexors - Writable state elements: * Register File, Data Memory - ALU (which operation?) – 4 bits |
Control unit” implementation steps: | - Identify control inputs and control outputs - Make a control signal table for each cycle - Derive control logic from the control table |
Control inputs: | Opcode (6 bits) |
Control outputs: | - RegDst - ALUSrc - MemtoReg - RegWrite - MemRead - MemWrite - Branch - Jump - ALUctr |
PCSrc Should be set (1) ... | if the instruction is branch on equal (beq), and the Zero output of the ALU is true |
What is pipelining? | - Multiple instructions are overlapped in execution - Key to making processors fast is to make everything work in parallel |
Speedup due to pipelining | the number of steps in the pipeline |
Classic MIPS Instructions: Fetch | Fetch instruction from memory |
Classic MIPS Instructions: Read | Read registers while decoding the instruction |
Classic MIPS Instructions: Execute | Execute the operation or calculate an address |
Classic MIPS Instructions: Access | Access an operand in data memory |
Classic MIPS Instructions: Write | Write the result into a register |
Pipeline Hazards: Structural hazards | attempt to use same resource twice |
Pipeline Hazards: Data hazards | attempt to use data before it is ready |
Pipeline Hazards: Control hazards | attempt to make decision before condition is evaluated |
How to we prevent Pipeline Hazards? | Can always resolve hazards by waiting |
Data Hazards: Forwarding: | - connect new value directly to next stage - getting the missing item early from the internal resources |
Forwarding paths are valid, | Only if the destination stage is later in time than the source stage. |
Reordering instructions | - Change the order of instructions to make sure the data is ready when the instruction is used - Use compiler to avoid hazards |
Control Hazards: Delayed Branch | - Always execute the next sequential instruction, with the branch taking place after that one instruction delay - Place an instruction immediately that is NOT affected by the branch |
Instruction Fetch: Pipelined Datapath: Instructions and data | Move generally from left to right through the 5 stages |
Instruction Fetch: Pipeline registers | - To retain the values of each stage for the next stage - Must be wide enough to store all the data corresponding to the lines that go through them |
Instruction Fetch: The PC address is incremented by.... | 4 |
Instruction Fetch: The instruction is read from... | Memory using the address in the PC and then placed in the IF/ID pipeline register |
Instruction Fetch: Then written back into the PC to... | Be ready for the next clock cycle |
Instruction fetch: The computer cannot know... | What type of instruction is being fetched |
The IF/ID pipeline register supplies... | The 16-bit immediate field and the register numbers to read |
All instructions advance during...from pipeline step to the next | each clock cycle |
Pipelined Datapath: 2 exceptions | - Place the result back into the register file - Selection of the next value of the PC |
Execution: The contents of register 1 and the sign-extended immediate from the ... | ID/EX pipeline register and adds them using the ALU |
Execution: The sum is placed in the | EX/MEM pipeline register |
Memory: The instruction loads the data into the... | MEM/WB pipeline register |
lw instruction: Any information needed in a later pipe stage must be passed to that stage via a... | pipeline register |
sw instruction: Steps to implement | Instruction fetch -> Instruction decode and register file read -> Execute and address calculation -> Memory access -> Write back |
What happens in the Write back stage during the sw instruction? | Nothing |
Why does the sw stage passes the writeback stage when nothing happens? | Because later instructions are already progressing at the maximum rate |
What a bug happens when the load word instruction is run? | IF/ID register pipeline register will be overwritten by following instructions, the destination register number is not preserved |
What is the solution to the bug in the load word instruction? | Pass the register number, From the ID/EX through EX/MEM to the MEM/WB pipeline register for use in the WB stage |
Why is pipelining difficult to understand? | Many instructions are simultaneously executing in a single datapath in every clock cycle |
Graphical representation of pipelining? | - Multiple-clock-cycle pipeline diagrams - Single-clock-cycle pipeline diagrams |
Single-clock-cycle pipeline diagrams: | Show the details of what is happening |
MIPS architecture was... | Designed to be pipelined |
Pipelining in MIPS | - Simple instruction format (makes IF, ID easy) - Memory operations only in lw, sw instructions (simplifies EX) - Memory operands aligned in memory (simplifies MEM) - Single value for write-back (limits forwarding) |
Pipelining in MIPS: Simple instruction format (makes IF, ID easy) | - Single-word instructions - Small number of instruction formats - Common fields in same place (e.g., rs, rt) in different formats |
There are NO separate write signals for, | The pipeline register |
If the PC is written on each clock cycle? | There are NO separate write signals |
Adding control: Basic approach | - Build on single-cycle control - Place control unit in ID stage - Pass control signals to following stages |
Adding control: Later approaches | - Extra features to deal with - Data forwarding; Stalls; Exceptions |
Pipelined Control: IF | Nothing special |
Pipelined Control: ID | No optional control lines to set |
Pipelined Control: EX | RegDst, ALUOp, ALUSrc |
Pipelined Control: MEM | Branch, MemRead, MemWrite |
Pipelined Control: WB | MemtoReg, RegWrite |
Implementing control | Setting the nine control lines to the values in each stage for each instruction |
Control lines start with the... | EX stage |
Forwarding | - The write is in the first half of the clock cycle - The read is in the second half, so the read delivers what is written -The result is available at the end of EX stage in first instruction (or clock cycle 3) |
Controlling Forwarding | First, detect a hazard and then, forward the proper value to resolve the hazard |
Data Hazard Detection: "EX" hazard | - EX/MEM - test whether instruction writes register file and examine rd register - ID/EX - test whether instruction reads rs or rt register and matches rd register in EX/MEM |
"EX" hazard: Two pairs of data hazard conditions are | 1a. EX/MEM.RegisterRd == ID/EX.RegisterRs 1b. EX/MEM.RegisterRd == ID/EX.RegisterRt |
Data Hazard Detection: "MEM" hazard | - MEM/WB - test whether instruction writes register file and examine rd (rt) register - ID/EX - test whether instruction reads rs or rt register and matches rd (rt) register in EX/MEM |
"MEM" hazard: Two pairs of data hazard conditions are | 2a. MEM/WB.RegisterRd == ID/EX.RegisterRs 2b. MEM/WB.RegisterRd == ID/EX.RegisterRt |
Add hardware to feed back... | ALU and MEM results to both ALU inputs |
If we can take the inputs to the ALU from any pipeline register rather than just ID/EX, | Forward the proper data |
The forwarding control will be in the... | EX Stage |
Why is the forwarding control will be in the EX stage? | Because the ALU forwarding multiplexers are found in that stage |
We can take the inputs to the ALU from...rather than just... | any pipeline register, ID/EX |
What if... Double Hazards? | The result is forwarded from the MEM stage |
Why is the result is forwarded from the MEM stage? | Because the result in the MEM stage is the more recent result |
When an instruction tries to read a register following a load instruction that writes the same register even with forwarding... | Cannot avoid a pipeline stall |
Hazard Detection Unit | Operate during the ID stage: It can insert the stall between the load and its use. |
Hazard Detection Unit: Test to see if, | Destination register of the load in the EX stage == source register of the instruction of ID stage |
To stall the pipeline... | To stall the pipeline |
if (we move the branch execution earlier in the pipeline) | then fewer instructions need be flushed. |
Two schemes for resolving control hazards: | - Assume branch NOT taken - Reducing the delay of branch |
If the branch is taken after assuming it is not... | - The instruction that are being fetched and decoded must be discarded. - Execution continues at the branch target. |
Many MIPS implementations move the branch execution to the ID stage. | 1 pipeline stall |
For branch equal, | Compare the two registers during the ID stage. |
Principle of Locality | Programs access a small proportion of their address space at any time |
Temporal locality | - If a data item is accessed, it is likely to be accessed again soon - e.g., instructions in a loop, induction variables |
Spatial locality | - If a data item is accessed, data items whose addresses are close by are likely to be accessed soon - e.g., sequential instruction access, array data |
Implement the memory of a computer as a | memory hierarchy, |
What is attached to the CPU? | Cache memory |
A level closer to the processor is a... | subset of any level further away |
All the data are stored at the... | lowest level |
Data is copied between... | only two adjacent levels at a time |
Upper Level Memory | Faster and smaller |
Lower Level Memory | lower and bigger |
Block | the minimum unit of information |
Hit | if the data requested is found in the upper level |
Miss | if the data is not found |
Hit rate (hit ratio) | the fraction of memory accesses found in the upper level |
Miss rate | (1 – hit rate) |
Hit time | Time to access the upper level of the memory hierarchy |
Miss penalty | (Time to replace a block in the upper level with the corresponding block from the lower level) + (the time to deliver this block to the processor) |
Cache | - Located between the CPU and main memory - Contains a collection of recent references: - Takes advantage of locality of access |
Principle of locality | Programs access a relatively small portion of their address space at any instant of time |
Direct Mapped | - When assign the cache location, - Based on the address of the word in the memory - Simplest way to assign a location in the cache. |
Number of cache entries | power of two |
Variations of Set Associative: Direct mapped: | one-way set-associative |
Variations of Set Associative: Fully associative with m entries: | m-way set-associative |
Variations of Set Associative: n: | degree of associativity |
Variations of Set Associative: Advantage of increasing the degree | Decrease the miss rate |
Variations of Set Associative: Disadvantage of increasing the degree: | Increase the hit time |
Read-stall clock cycles | (Reads / Program) x Read-stall x clock cycles |
Write-stall clock cycles (for write-through) | ((Write/Program) x Write miss rate x Write miss penalty ) + Write buffer stall |
Larger blocks exploit spatial locality to... | lower miss rates |
But increasing the block size results in what problem? | Increase the miss penalty |
Cache inconsistency | - Write the data into only the data cache (without changing main memory) - Then, memory would have a different value from that in the cache. |
Reduce the miss penalty? | Increase the bandwidth from the memory to cache. |
Increase the bandwidth to memory | - Widen the memory and the buses between the processor and memory - Parallel access to all the words of the block |
The memory chips are organized in? | Banks to read or write multiple words in one access time. |
CPU time | (CPU execution clock cycles + Memory-stall clock cycles) x Clock cycle time |
Memory-stall cycles | (Memory access/ program) x Miss rate x Miss penalty or (Instructions/ program) x (Misses/ Instructions) x Miss penalty |
Instruction Set Architecture (ISA) | The one “true” language of a machine, Boundary between hardware and software, The hardware’s specification, and Defines “what” a machine does |
Machine Organization | The “guts” of the machine, “how” the hardware works, The implementation, and Must obey the ISA abstraction |
Application software | Written in high-level language |
System software | - Compiler: translates HLL code to machine code - Operating System: service code |
Hardware | Processor, memory, I/O controllers |
RISC philosophy: | - Fixed instruction lengths - Limited addressing modes Program is longer but takes shorter to interpret - Limited operations |
CISC philosophy: | - Increased capability of each instruction - Lead to more addressing modes - Variable length instructions - Instructions execution time variable Program is shorter but takes longer to interpret |
Hardware performance | - Time, Time, & Time - Key to the effectiveness of an entire system of hardware and software - Critical to purchasers, and to designers |
Response time (or execution time) | - Time between the start and completion of a task - Important to individual users |
Throughput | - Total amount of work done in a given time - Important to data center managers |
Decreasing response time almost always, | Improve throughput |
Elapse time | - counts everything (disk and memory accesses, I/O , etc.) - a useful number, but often not good for comparison purposes |
CPU time (CPU execution time), | - doesn't count I/O or time spent running other programs - can be broken up into system time, and user time |