The Intel Haswell microarchitecture Part 1
In April this year launched its first intel microprocessors based on Ivy Bridge microarchitecture : the third-generation Core CPU, microprocessors which represented a slight evolution on the previous Sandy Bridge microarchitecture. But next year Intel plans to take the plunge with the new Haswell microarchitecture, which will have significant improvements.
The Haswell microarchitecture
If Ivy Bridge was a slight refinement of Sandy Bridge, whereby performance is improved by cycle in approximately 3 to 7%, and brought improvements in consumption between 15 to 20%, the manufacturing process of 22nm The principal author of this lower consumption; Haswell introduces consistent architectural improvements aimed at improving both performance per cycle energy consumption.
While Haswell is not completely new microarchitecture rumored (based on still maintains its predecessors), this does not mean that there are no major improvements. To start the base architecture is not changed, the basic design remains Haswell premiered in Sandy Bridge, so it continues based on modules (units of whole SIMD FPU, DMI Bus, integrated memory controller and IGP) interconnected by an internal bus shaped ring (Ring Interconnect), but each of the modules has been consistent trends.
More performance per cycle
Intel architecture has profoundly changed beginning with adding two additional steps to the execution pipeline out of order, thanks to which it is now capable of executing up to eight instructions per cycle (Ivy Bridge running up to six instructions per cycle); pursued with this increase chip performance, both in standard mode and in its way SMT (HyperThreading).
Although Intel does not mention it, we assume that Haswell maintains dynamic data structure released in Ivy Bridge, which coupled with higher yield per cycle of the chip, should further reduce the performance penalty when running applications not optimized for HyperThreading to be enabled this feature would enhance their performance in applications optimized for it.
We also have a new and improved branch prediction unit conditional, an L2 cache (TLB larger) with lower latencies and twice the bandwidth, and L1D cache (data) larger.
Intel adopts CMT (modular becomes Haswell)
As with AMD Bulldozer micro-architecture ; Intel design embraces the CMT (Cluster Multi Threading), for which makes a modification to the architecture deep adding an integer processing unit and the respective additional storage and routing logic, to so that both devices can operate simultaneously.
That is, each core Haswell will have two integer processing units and a floating point unit, a design very similar to that used by the modules of the micro-architectures AMD Bulldozer, Piledriver, Steamroller and Excavator. Thanks to that Haswell will nearly double the number of simultaneous tasks applied to integer calculations.
Although Intel does not leave very clear, we assume that technology will support HyperThreading CMT also acting on additional alu core, which would have a hypothetical Core i7-4770K (Haswell-DT) quad core, you can run up to 16 Integer processing threads simultaneously (from four to eight threads through CMT, and from eight to sixteen strands through HyperThreading)
The new units FMA
The floating point unit is where we see the most dramatic changes Haswell over their predecessors, for Intel abandons the use of its two 128-bit SIMD units to debut its new floating point unit (FPU) support FMA instructions, which is very similar to that present in micro-architectures and Piledriver AMD Bulldozer, and like them is formed by two units of 128 bits per FMA core.
Both 128-bit FMA, to be used together (also individually operated simultaneously CMT mode), are capable of executing instructions AVX2 of 256 bits, plus they can operate under both floating point calculations, and low estimates of numbers integers. Like FMA units of AMD Piledriver micro-architecture capable of executing instructions FMA3, which require fewer transistors FMA4 units used in the micro-architecture of AMD Bulldozer, using only three operands (FMA4 has four operands, this complicates the design of the unit).
Turbo Boost 3.0
As was filtered for some time, Haswell a voltage regulator integrated in the chip , which provides an adjustment and control more precise, through which Intel operating frequencies provide a more aggressive variants oriented in high yield and optimized for low power variants oriented laptops.
Intel also has improved the response time of Turbo Boost by 25% compared to Ivy Bridge, this will provide better performance at the moment requires a larger preview.
Increased energy efficiency
Intel brings an entirely new power management logic called Power Managemment Haswell, which adds a new power saving mode called S0ix, which binds to the traditional modes S0 (load) and S3/S4 (standby). S0ix could define as a sleep mode, but before the operating system is reported in active mode, improving response time to switch between load and idle.
S0ix acts on all units Haswell (not only on the CPU and GPU) and thanks to this new way of energy savings are achieved greater than 20 times on Ivy Bridge (in sleep mode requires only 5mW), with which Intel aims to bring the same experience of the tablets to laptops and desktops.
Within a few minutes publish the second part of this article on the Haswell microarchitecture of Intel.Tags: 22nm, alu, architecture, AVX2, CMT, CPU, FMA, fma3, Haswell, instructions, intel, micro-architecture, module, non-planar, performance, specs, Tri-Gate