Lecture6BranchPrediction:讲座6分支预测

发布时间：2023-12-06 作者：admin 来源：讲座

2023年12月6日发(作者：)

-

Lecture 6

Branch Prediction

Why do we want to predict branches?

• MIPS based pipeline – 1 instruction issued per cycle, branch

hazard of 1 cycle.

– Delayed branch

• Modern processor and next generation – multiple instructions

issued per cycle, more branch hazard cycles will incur.

– Cost of branch misfetch goes up

– Pentium Pro – 3 instructions issued per cycle, 12+ cycle misfetch penalty

 HUGE penalty for a misfetched path following a branch

1 Branch Prediction

• Static Prediction

– Always taken, always not taken

– Opcode based

– Displacement based (forward not taken, backward taken)

– Compiler directed (branch likely, branch not likely)

• Dynamic Prediction

– 1 bit predictor – remember last taken/not taken per branch

 Use a branch-prediction buffer or branch-history table with 1 bit entry

 Use part of the PC (low-order bits) to index buffer/table – Why?

– Multiple branches may share the same bit

 Invert the bit if the prediction is wrong

 Backward branches for loops will be mispredicted twice

-- once upon loop exit and then again on loop entry

2-bit Branch Prediction

• Has 4 states instead of 2

• A prediction must miss twice before it is changed

• Good for backward branches of loops

2 Branch History Table

• Has limited size

• 2 bits by N (e.g. 4K entries)

• Uses low-order bits of branch PC to

choose entry

• Plot misprediction instead of prediction

branch PC

BHT

Observations

• Prediction Accuracy ranges from 99% to 82% or a misprediction

rate of 1% to 18%

• Misprediction for integer programs (gcc, espresso, eqntott, li) is

substantially higher than FP programs (nasa7, matrix300,

tomcatv, doduc, spice, fppp)

• Branch penalty involves both misprediction rate and branch

frequency, and is higher for integer benchmarks

• Prediction accuracy improves with buffer size, but does not

improve beyond 4K entries

3 Correlating or Two-level Predictors

• Correlating branch predictors also look at other branches for clues.

Consider the following example.

if (aa==2) -- branch b1

aa = 0;

if (bb==2) --- branch b2

bb = 0;

if(aa!=bb) { … --- branch b3 – Clearly depends on the results of b1 and

Prediction if the last branch is NT

Prediction if the last branch is T

(1,1) predictor – uses history of 1 branch and uses a 1-bit predictor

Another Example

If (d==0)

d=1;

If (d=1) ---

Code Sequence assuming d is assigned to R1:

BNEZ R1, L1 ; branch b1 (d!=0)

DADDU R1,R0,#1 ; d==0, so d=1

L1: DADDIU R3,R1,#-1

BNEZ R3,L2 ; branch b2 (d!=1)

Possible Execution Sequence for the code fragment;

Initial d d==0? B1 d before b2 d==1? b2

0 yes not taken 1 yes not taken

1  no taken 1 yes not taken

2  no taken 2 no taken

Clearly, if b1 is not taken b2 will not be taken => correlation

4 Correlating Branch Predictor

• If we use 2 branches as histories, then there are 4 possibilities

(T-T, NT-T, NT-NT, NT-T).

• For each possibility, we need to use a predictor (1-bit or 2-bit).

• And this repeats for every branch.

(2,2) branch prediction

Performance of Correlating Branch Prediction

• With same number of

state bits, (2,2) performs

better than non-correlating 2-bit

predictor.

• Outperforms a 2-bit

predictor with infinite

number of entries

5 General (m,n) Branch Predictors

• The global history register is an m-bit shift register that records

the last m branches encountered by the processor

• Usually use both the PC address and the GHR (2-level)

m-bit ghr

n-bit predictors

Combining

funciton

Is Branch Predictor Enough?

• When is using branch prediction beneficial?

– When the outcome is known later than the target

• If we predict the branch as taken, and suppose that is correct,

what is the target address?

– Need a mechanism to provide target address as well

• Can we eliminate the one cycle delay for the 5-stage pipeline?

– Need to fetch from branch target immediately after branch

6 Branch Target Buffer (BTB)

BTB is a cache that contains the predicted PC value for

taken branches.

Is the current instruction a branch ?

•

BTB provides the answer before the current instruction is decoded

and therefore enables fetching to begin after IF-stage .

What is the branch target ?

•

BTB provides the branch target if the prediction is a taken direct

branch (for not taken branches the target is simply PC+4 ) .

BTB

7 BTB operations

BTB Performance

• Two things can go wrong

– BTB miss (misfetch)

– Mispredicted a branch (mispredict)

• Suppose for branches, BTB hit rate of 85% and predict accuracy

of 90%, misfetch penalty of 2 cycles and mispredict penalty of 5

cycles. What is the average branch penalty?

2*(15%) + 5*(85%*10%)

• BTB and BPT can be used together to perform better prediction –

for example 2 bit predictor can be used for branch prediction.

8 Integrated Instruction Fetch Unit

Separate out IF from the pipeline and integrate with the following

components. So, the pipeline consists of Issue, Read, EX, and WB

(scoreboarding) ; Or Issue, EX and WB stages (Tomasulo).

1. Integrate Branch Prediction – Branch predictor is part of the IFU.

2. Instruction Prefetch – Fetch instruction from instruction memory ahead of

PC computation with the help of branch predictor & store in a prefetch buffer.

3. Instruction Memory Access and Buffering - Keep on filling the Instruction

Queue independent of the execution => Decoupled Execution.

Branch Prediction Summary

• The better we predict, the lesser penalty we might incur

• 2-bit predictors capture branch behavior well

• Correlating predictors improve accuracy, particularly when

combined with 2-bit predictors

• Accurate branch prediction does no good if we don’t know there

was a branch to predict

• BTB identifies branches in IF stage

• BTB combined with branch prediction table

-

👁️ 阅读量：0

🔖 本文标签：

🔥 最新发布文章