Building a superscalar machine

Thanks to branch prediction, we now have nice, full pipelines. But while we can make pipelines deeper still as the Intel Pentium IV architecture is evidence building a 36 stage pipeline just hurts no matter how clever you think you are about predicting branches. So we need some new technique for doing yet more work every cycle.

How about running two instructions at a time? If we could run two pipelines side by side on one chip, one operating at PC and the other at PC+1, and both incriminating PC by 2 every cycle as long as the Nth and N+1th instructions don’t suffer from a data dependency the we can totally run two instructions entirely in parallel! Congratulations, that’s a superscalar architecture. In fact there is exactly nothing more to superscalar than running two or more pipelines side by side.

However the data hazard logic becomes more complicated from this added parallelism. Now we have to check that the instructions in all Kth parallel pipelines do not depend data-wise on the results partially computed anywhere else in the parallel pipeline system. Yeah, this gets messy quickly.

Also control flow gets messy fast like this. If the Nth instruction is a branch, and the N+1st instruction performs some operation, how can we synchronize the pipelines to make sure that if the N+1st instruction should not be executed according to the results of the Nth instruction. While it is possible to introduce the hardware level synchronization between pipelines required for a branch in one to squash operation(s) in another, doing so is troublesome. For this reason the MIPS specification explicitly states that the instruction after a branch must be assumed to be executed. For this reason it is common to see branch instructions in MIPS followed by a no-op operation :D

Well now we’ve done all that we can with pipelines and prediction.. we’ve tried multiple pipelines in parallel with limited success and it seems like we’re going to have to take a more concerted stab at slaying the data dependency beast in order to fill our pipelines to the brim all the time. Until next time when we slay the best with out of order execution!