How to determine the length of single clock cycle in a single-cycle datapath?

The clock cycle is determined by the longest possible path (or the slowest operation, lw) in the machine

List five pipeline stages in a single-cycle datapath.

IF(Instruction Fetch), ID(Instruction Decode), EX(Execute / Address Calculation), MEM(Memory Access (read / write), and WB (Write Back (results into register file)

Two's complement operation,

Subtraction using addition of negative numbers

- Addition & subtraction - Need to know number representations - Integer and floating point arithmetic

How do you know there is no overflow?

- When adding a positive and a negative number - When signs are the same for subtraction

How do we know when there is an overflow

Overflow occurs when the value affects the sign:

Examples of overflow:

-When adding two positives yields a negative -Adding two negatives gives a positive, or -Subtract a negative from a positive and get a negative, or -Subtract a positive from a negative and get a positive,

Loads a byte and stores the sign-extended version in a word

Loads a byte and stores it in a word

Drawback of integer binary multiplication

- Half of 64-bit multiplicand register are zeros - Half of 64 bit adder is adding zeros

Solution to the drawback of integer binary multiplication

- Shift product right instead of multiplicand left - Only left half of product register added to multiplicand

Steps in hardware integer division:

- Shift the dividend left one position - Subtract the divisor from the left half of the dividend - if (result positive), Shift left a 1 into the quotient; - else, Shift left a 0 into the quotient; - Once the result is positive, Repeat the process for

Signed magnitude representation

(-1)^S x F x B^E, S = sign (either 0 (pos) or 1 (neg)) F = fraction (also called the mantissa) B = base (implied) (Ex, 2 or 10) E = exponent

Overflow in Signed magnitude representation

The exponent is too large to be represented in the exponent field

Underflow in Signed magnitude representation

The negative exponent is too large to fit in the exponent field

normalized numbers in binary format: single precision

The significand is 24 bits long (1 + 23-bit fraction)

normalized numbers in binary format: double precision

The significand is 53 bits long (1 + 52-bit fraction)

Instruction formats: R-type

Instructions like registers and words of data, are 32 bits long Ex: add $t0, $s1, $s2

Instruction formats: I-type

Load/Store instruction format Ex: lw $t1, 24($s2)

Instruction formats: J-type

Jump formats Ex: jal 0x00400030

Instruction implementation: PC goes to

instruction memory, fetch instruction

The functional units in the MIPS: Combinational

- Elements that operate on data values - Output is a function of input

The functional units in the MIPS: Sequential

- Elements that contain state - Output depends on both inputs and the contents of the internal state

Logic Design Conventions: Sequential Elements: Register

- stores data in a circuit - Uses a clock signal to determine when to update the stored value - Edge-triggered: update when Clk changes from 0 to 1

Logic Design Conventions: Sequential Elements: Register with write control

- Only update on clock edge when write control input is 1 - Used when stored value is required later

- When signals can be read & when they can be written - Edge-triggered clocking – update only on a clock edge

stores the instructions of a program

keeps the address of the instruction

increment the PC or jump to the address of the next instruction

To read one word from a register?

Need an input (register number) and one output (data)

To write one word from a register?

Need two inputs (register number & data)

 No datapath resource can be used more than once per instruction?

A separate instruction memory is required

Multiplexor (Data Selector):

- Allow multiple connections to the input of an element - A control signal selects among the inputs

The key differences between R-type and memory-reference (lw, sw) are:

- The second input to the ALU is either a register (R-type) or the sign-extended half of the instruction (memory) - The value stored into a destination register comes from either the ALU (R-type) or the memory (lw? or sw?)

Help

Options

focusNode

Didn't know it?
click below

Knew it?
click below

Don't Know

Remaining cards (0)

Know

retry

shuffle

restart

0:00

Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

CS 3375

Question	Answer
The performance of a machine is determined by instruction count (IC), clock cycles per instruction (CPI), and clock cycle time. Here, IC is determined by compiler and [ ? ].	Instruction Set Architecture
In a single cycle data path, every instruction begins on one clock edge and completes execution on the next clock edge. So the CPI is [ ? ]	CPI = 1
In 6 by 1 multiplexer, how many selector bits do we need?	3 bits
How to determine the length of single clock cycle in a single-cycle datapath?	The clock cycle is determined by the longest possible path (or the slowest operation, lw) in the machine
Pipeline rate is limited by [slowest? or fastest?] pipeline stage.	Slowest
List five pipeline stages in a single-cycle datapath.	IF(Instruction Fetch), ID(Instruction Decode), EX(Execute / Address Calculation), MEM(Memory Access (read / write), and WB (Write Back (results into register file)
Pipelining improves performance by increasing instruction throughput and decreasing the execution time of an individual instruction.	False
Two's complement operation,	Subtraction using addition of negative numbers
Arithmetic function:	- Addition & subtraction - Need to know number representations - Integer and floating point arithmetic
Logical operations:	- and - or - not
How do you know there is no overflow?	- When adding a positive and a negative number - When signs are the same for subtraction
How do we know when there is an overflow	Overflow occurs when the value affects the sign:
Examples of overflow:	-When adding two positives yields a negative -Adding two negatives gives a positive, or -Subtract a negative from a positive and get a negative, or -Subtract a positive from a negative and get a positive,
MIPS commands that don't detect overflow	addu, addiu, subu
lb command	Loads a byte and stores the sign-extended version in a word
lbu command	Loads a byte and stores it in a word
Drawback of integer binary multiplication	- Half of 64-bit multiplicand register are zeros - Half of 64 bit adder is adding zeros
Solution to the drawback of integer binary multiplication	- Shift product right instead of multiplicand left - Only left half of product register added to multiplicand
Steps in hardware integer division:	- Shift the dividend left one position - Subtract the divisor from the left half of the dividend - if (result positive), Shift left a 1 into the quotient; - else, Shift left a 0 into the quotient; - Once the result is positive, Repeat the process for
Signed magnitude representation	(-1)^S x F x B^E, S = sign (either 0 (pos) or 1 (neg)) F = fraction (also called the mantissa) B = base (implied) (Ex, 2 or 10) E = exponent
Overflow in Signed magnitude representation	The exponent is too large to be represented in the exponent field
Underflow in Signed magnitude representation	The negative exponent is too large to fit in the exponent field
IEEE 754 floating-point standard: normalized numbers in binary format	(-1)^s x (1 + Fraction) x 2^E
normalized numbers in binary format: single precision	The significand is 24 bits long (1 + 23-bit fraction)
normalized numbers in binary format: double precision	The significand is 53 bits long (1 + 52-bit fraction)
normalized numbers in binary format: single precision exponential bias	Exponential Bias = 127
normalized numbers in binary format: double precision exponential bias	Exponential Bias = 1023
How is the Instruction count determined?	Determined by ISA and compiler
Instruction formats: R-type	Instructions like registers and words of data, are 32 bits long Ex: add $t0, $s1, $s2
Instruction formats: I-type	Load/Store instruction format Ex: lw $t1, 24($s2)
Instruction formats: J-type	Jump formats Ex: jal 0x00400030
Instruction implementation: PC goes to	instruction memory, fetch instruction
Instruction implementation:Register numbers goes to	register file, read registers
In R - type instructions format, use ALU to calculate	Arithmetic result
In I - type instructions format, use ALU to calculate	Memory address for load/store
In J- type instructions format, use ALU to calculate	Branch target address
The functional units in the MIPS: Combinational	- Elements that operate on data values - Output is a function of input
The functional units in the MIPS: Sequential	- Elements that contain state - Output depends on both inputs and the contents of the internal state
Logic Design Conventions: Sequential Elements: Register	- stores data in a circuit - Uses a clock signal to determine when to update the stored value - Edge-triggered: update when Clk changes from 0 to 1
Logic Design Conventions: Sequential Elements: Register with write control	- Only update on clock edge when write control input is 1 - Used when stored value is required later
Clocking Methodology	- When signals can be read & when they can be written - Edge-triggered clocking – update only on a clock edge
instruction memory	stores the instructions of a program
program counter (PC)	keeps the address of the instruction
adder	increment the PC or jump to the address of the next instruction
To read one word from a register?	Need an input (register number) and one output (data)
To write one word from a register?	Need two inputs (register number & data)
Single cycle datapath: The implementation attempts to execute all instructions in 1 clock cycle	CPI is always 1
Any element needed more than once must be	duplicated
 No datapath resource can be used more than once per instruction?	A separate instruction memory is required
Multiplexor (Data Selector):	- Allow multiple connections to the input of an element - A control signal selects among the inputs
The key differences between R-type and memory-reference (lw, sw) are:	- The second input to the ALU is either a register (R-type) or the sign-extended half of the instruction (memory) - The value stored into a destination register comes from either the ALU (R-type) or the memory (lw? or sw?)
Single Cycle Datapath & Control Unit equation	CPU = Datapath + Control
Single-cycle design in a Single Cycle Datapath & Control Unit:	- Instruction takes exactly one clock cycle (CPI = 1) - Datapath units used only once per cycle - Writable state updated at end of cycle
What must be “controlled” in a Single Cycle Datapath & Control Unit?	- Multiplexors - Writable state elements: * Register File, Data Memory - ALU (which operation?) – 4 bits
Control unit” implementation steps:	- Identify control inputs and control outputs - Make a control signal table for each cycle - Derive control logic from the control table
Control inputs:	Opcode (6 bits)
Control outputs:	- RegDst - ALUSrc - MemtoReg - RegWrite - MemRead - MemWrite - Branch - Jump - ALUctr
PCSrc Should be set (1) ...	if the instruction is branch on equal (beq), and the Zero output of the ALU is true
What is pipelining?	- Multiple instructions are overlapped in execution - Key to making processors fast is to make everything work in parallel
Speedup due to pipelining	the number of steps in the pipeline
Classic MIPS Instructions: Fetch	Fetch instruction from memory
Classic MIPS Instructions: Read	Read registers while decoding the instruction
Classic MIPS Instructions: Execute	Execute the operation or calculate an address
Classic MIPS Instructions: Access	Access an operand in data memory
Classic MIPS Instructions: Write	Write the result into a register
Pipeline Hazards: Structural hazards	attempt to use same resource twice
Pipeline Hazards: Data hazards	attempt to use data before it is ready
Pipeline Hazards: Control hazards	attempt to make decision before condition is evaluated
How to we prevent Pipeline Hazards?	Can always resolve hazards by waiting
Data Hazards: Forwarding:	- connect new value directly to next stage - getting the missing item early from the internal resources
Forwarding paths are valid,	Only if the destination stage is later in time than the source stage.
Reordering instructions	- Change the order of instructions to make sure the data is ready when the instruction is used - Use compiler to avoid hazards
Control Hazards: Delayed Branch	- Always execute the next sequential instruction, with the branch taking place after that one instruction delay - Place an instruction immediately that is NOT affected by the branch
Instruction Fetch: Pipelined Datapath: Instructions and data	Move generally from left to right through the 5 stages
Instruction Fetch: Pipeline registers	- To retain the values of each stage for the next stage - Must be wide enough to store all the data corresponding to the lines that go through them
Instruction Fetch: The PC address is incremented by....	4
Instruction Fetch: The instruction is read from...	Memory using the address in the PC and then placed in the IF/ID pipeline register
Instruction Fetch: Then written back into the PC to...	Be ready for the next clock cycle
Instruction fetch: The computer cannot know...	What type of instruction is being fetched
The IF/ID pipeline register supplies...	The 16-bit immediate field and the register numbers to read
All instructions advance during...from pipeline step to the next	each clock cycle
Pipelined Datapath: 2 exceptions	- Place the result back into the register file - Selection of the next value of the PC
Execution: The contents of register 1 and the sign-extended immediate from the ...	ID/EX pipeline register and adds them using the ALU
Execution: The sum is placed in the	EX/MEM pipeline register
Memory: The instruction loads the data into the...	MEM/WB pipeline register
lw instruction: Any information needed in a later pipe stage must be passed to that stage via a...	pipeline register
sw instruction: Steps to implement	Instruction fetch -> Instruction decode and register file read -> Execute and address calculation -> Memory access -> Write back
What happens in the Write back stage during the sw instruction?	Nothing
Why does the sw stage passes the writeback stage when nothing happens?	Because later instructions are already progressing at the maximum rate
What a bug happens when the load word instruction is run?	IF/ID register pipeline register will be overwritten by following instructions, the destination register number is not preserved
What is the solution to the bug in the load word instruction?	Pass the register number, From the ID/EX through EX/MEM to the MEM/WB pipeline register for use in the WB stage
Why is pipelining difficult to understand?	Many instructions are simultaneously executing in a single datapath in every clock cycle
Graphical representation of pipelining?	- Multiple-clock-cycle pipeline diagrams - Single-clock-cycle pipeline diagrams
Single-clock-cycle pipeline diagrams:	Show the details of what is happening
MIPS architecture was...	Designed to be pipelined
Pipelining in MIPS	- Simple instruction format (makes IF, ID easy) - Memory operations only in lw, sw instructions (simplifies EX) - Memory operands aligned in memory (simplifies MEM) - Single value for write-back (limits forwarding)
Pipelining in MIPS: Simple instruction format (makes IF, ID easy)	- Single-word instructions - Small number of instruction formats - Common fields in same place (e.g., rs, rt) in different formats
There are NO separate write signals for,	The pipeline register
If the PC is written on each clock cycle?	There are NO separate write signals
Adding control: Basic approach	- Build on single-cycle control - Place control unit in ID stage - Pass control signals to following stages
Adding control: Later approaches	- Extra features to deal with - Data forwarding; Stalls; Exceptions
Pipelined Control: IF	Nothing special
Pipelined Control: ID	No optional control lines to set
Pipelined Control: EX	RegDst, ALUOp, ALUSrc
Pipelined Control: MEM	Branch, MemRead, MemWrite
Pipelined Control: WB	MemtoReg, RegWrite
Implementing control	Setting the nine control lines to the values in each stage for each instruction
Control lines start with the...	EX stage
Forwarding	- The write is in the first half of the clock cycle - The read is in the second half, so the read delivers what is written -The result is available at the end of EX stage in first instruction (or clock cycle 3)
Controlling Forwarding	First, detect a hazard and then, forward the proper value to resolve the hazard
Data Hazard Detection: "EX" hazard	- EX/MEM - test whether instruction writes register file and examine rd register - ID/EX - test whether instruction reads rs or rt register and matches rd register in EX/MEM
"EX" hazard: Two pairs of data hazard conditions are	1a. EX/MEM.RegisterRd == ID/EX.RegisterRs 1b. EX/MEM.RegisterRd == ID/EX.RegisterRt
Data Hazard Detection: "MEM" hazard	- MEM/WB - test whether instruction writes register file and examine rd (rt) register - ID/EX - test whether instruction reads rs or rt register and matches rd (rt) register in EX/MEM
"MEM" hazard: Two pairs of data hazard conditions are	2a. MEM/WB.RegisterRd == ID/EX.RegisterRs 2b. MEM/WB.RegisterRd == ID/EX.RegisterRt
Add hardware to feed back...	ALU and MEM results to both ALU inputs
If we can take the inputs to the ALU from any pipeline register rather than just ID/EX,	Forward the proper data
The forwarding control will be in the...	EX Stage
Why is the forwarding control will be in the EX stage?	Because the ALU forwarding multiplexers are found in that stage
We can take the inputs to the ALU from...rather than just...	any pipeline register, ID/EX
What if... Double Hazards?	The result is forwarded from the MEM stage
Why is the result is forwarded from the MEM stage?	Because the result in the MEM stage is the more recent result
When an instruction tries to read a register following a load instruction that writes the same register even with forwarding...	Cannot avoid a pipeline stall
Hazard Detection Unit	Operate during the ID stage: It can insert the stall between the load and its use.
Hazard Detection Unit: Test to see if,	Destination register of the load in the EX stage == source register of the instruction of ID stage
To stall the pipeline...	To stall the pipeline
if (we move the branch execution earlier in the pipeline)	then fewer instructions need be flushed.
Two schemes for resolving control hazards:	- Assume branch NOT taken - Reducing the delay of branch
If the branch is taken after assuming it is not...	- The instruction that are being fetched and decoded must be discarded. - Execution continues at the branch target.
Many MIPS implementations move the branch execution to the ID stage.	1 pipeline stall
For branch equal,	Compare the two registers during the ID stage.
Principle of Locality	Programs access a small proportion of their address space at any time
Temporal locality	- If a data item is accessed, it is likely to be accessed again soon - e.g., instructions in a loop, induction variables
Spatial locality	- If a data item is accessed, data items whose addresses are close by are likely to be accessed soon - e.g., sequential instruction access, array data
Implement the memory of a computer as a	memory hierarchy,
What is attached to the CPU?	Cache memory
A level closer to the processor is a...	subset of any level further away
All the data are stored at the...	lowest level
Data is copied between...	only two adjacent levels at a time
Upper Level Memory	Faster and smaller
Lower Level Memory	lower and bigger
Block	the minimum unit of information
Hit	if the data requested is found in the upper level
Miss	if the data is not found
Hit rate (hit ratio)	the fraction of memory accesses found in the upper level
Miss rate	(1 – hit rate)
Hit time	Time to access the upper level of the memory hierarchy
Miss penalty	(Time to replace a block in the upper level with the corresponding block from the lower level) + (the time to deliver this block to the processor)
Cache	- Located between the CPU and main memory - Contains a collection of recent references: - Takes advantage of locality of access
Principle of locality	Programs access a relatively small portion of their address space at any instant of time
Direct Mapped	- When assign the cache location, - Based on the address of the word in the memory - Simplest way to assign a location in the cache.
Number of cache entries	power of two
Variations of Set Associative: Direct mapped:	one-way set-associative
Variations of Set Associative: Fully associative with m entries:	m-way set-associative
Variations of Set Associative: n:	degree of associativity
Variations of Set Associative: Advantage of increasing the degree	Decrease the miss rate
Variations of Set Associative: Disadvantage of increasing the degree:	Increase the hit time
Read-stall clock cycles	(Reads / Program) x Read-stall x clock cycles
Write-stall clock cycles (for write-through)	((Write/Program) x Write miss rate x Write miss penalty ) + Write buffer stall
Larger blocks exploit spatial locality to...	lower miss rates
But increasing the block size results in what problem?	Increase the miss penalty
Cache inconsistency	- Write the data into only the data cache (without changing main memory) - Then, memory would have a different value from that in the cache.
Reduce the miss penalty?	Increase the bandwidth from the memory to cache.
Increase the bandwidth to memory	- Widen the memory and the buses between the processor and memory - Parallel access to all the words of the block
The memory chips are organized in?	Banks to read or write multiple words in one access time.
CPU time	(CPU execution clock cycles + Memory-stall clock cycles) x Clock cycle time
Memory-stall cycles	(Memory access/ program) x Miss rate x Miss penalty or (Instructions/ program) x (Misses/ Instructions) x Miss penalty
Instruction Set Architecture (ISA)	The one “true” language of a machine, Boundary between hardware and software, The hardware’s specification, and Defines “what” a machine does
Machine Organization	The “guts” of the machine, “how” the hardware works, The implementation, and Must obey the ISA abstraction
Application software	Written in high-level language
System software	- Compiler: translates HLL code to machine code - Operating System: service code
Hardware	Processor, memory, I/O controllers
RISC philosophy:	- Fixed instruction lengths - Limited addressing modes Program is longer but takes shorter to interpret - Limited operations
CISC philosophy:	- Increased capability of each instruction - Lead to more addressing modes - Variable length instructions - Instructions execution time variable Program is shorter but takes longer to interpret
Hardware performance	- Time, Time, & Time - Key to the effectiveness of an entire system of hardware and software - Critical to purchasers, and to designers
Response time (or execution time)	- Time between the start and completion of a task - Important to individual users
Throughput	- Total amount of work done in a given time - Important to data center managers
Decreasing response time almost always,	Improve throughput
Elapse time	- counts everything (disk and memory accesses, I/O , etc.) - a useful number, but often not good for comparison purposes
CPU time (CPU execution time),	- doesn't count I/O or time spent running other programs - can be broken up into system time, and user time

Created by: Valitine

Popular Miscellaneous sets

Review Words Vocabulary

CHS Root Words 10

CHS Root Words 02

"Know" box contains:
Time elapsed:
Retries: