The new micro-architecture modular AMD Steamroller
Last year amd released their first microprocessors based on its micro-modular architecture first generation Bulldozer , which were based its microprocessors Zambezi (desktop socket AM3 +), Zurich (server socket AM3 +), Valencia (servers 2P socket C32) and Interlagos (G34 socket 4P servers), and this year saw the appearance of the first microprocessor-based modular architecture micro second generation Piledriver , acquaintances AMD A-Series apu second generation “Trinity”. And AMD from and details what will be its third generation modular architecture Steamroller.
With AMD Bulldozer introduced one of the most interesting micro-architectures that have been seen in a long time, introducing the innovative (but confusing) modular concept, which saves circuitry combining two integer processing cores along with a floating point unit ( FPU) inside a single shared something away from the traditional concept that combines core processing unit integers and floating point unit.
This modular design allowed AMD microprocessors provide up to 8 cores (up to 16-core server), but unlike traditional cores, does not have 8 floating point units, but “4 acting as 8″, for each of FPU divide their resources in half when both integer cores are in use, which is why AMD’s called Flex-FP.
But while Bulldozer micro-architecture is innovative, brought an effect that was not liked by many users: a decrease in performance per cycle compared to previous K10.5 microarchitecture microprocessors used in Phenom II, this lower yield per cycle was offset by higher operating frequency, and the greatest power of chip multiprocessing, a performance by granting cycle slightly lower core (K10.5 vs Bulldozer at the same frequency) in exchange for a greater number of them (up to 8 Bulldozer cores and up to 6 in K10.5).
AMD improved this situation with your current Piledriver micro-architecture, which offers a 15% yield per cycle than Bulldozer, and incorporates Improvements aimed at reducing energy consumption as RCM (Resonant Clock Mesh), which allows it to offer even more frequencies higher than microprocessor AMD FX-8150 (3.6GHz base and 4.2GHz in Turbo mode) because the new CPU AMD FX-8350 “Vishera” operate at a base frequency of 4GHz and 4.2GHz in Turbo mode (half their active nuclei).
There are still a few months to Vishera (based on Piledriver micro-architecture) makes his appearance, and although AMD is preparing its launch, also enlists future Steamroller micro-architecture, which we describe below.
The micro-architecture Steamroller
The modular microarchitecture Steamroller third generation of AMD promises improved performance in all areas while focusing on performance per watt, for it greatly AMD has optimized its modules to prioritize Steamroller performance per cycle, focused on introducing many improvements to better exploit its full capabilities.
If with AMD Piledriver improved branch prediction unit, prefetcher, scheduler, their shared instruction decoders, resources for out of order execution, the floating point unit, the efficiency of L1/L2 cache, and reduced consumption What’s left to improve? So much more.
Modules with higher performance per cycle
Steamroller modular design keeps the base present in Bulldoser and Piledriver, but brings many improvements aimed at improving both the yield per cycle of the chip as massive parallel performance.
For starters we have a four-instruction decoder dedicated channels for each of the two processing cores present in the whole module (both are capable of running in parallel), a big improvement over the four-way decoder shared between the two cores integer and floating point unit present in the modules of the micro-architecture Bulldozer and Piledriver. This eliminates the performance penalty exists to execute two processing threads per module, while increasing the yield of mono-thread tasks.
We also have a pre-fetch unit instruction scheduler (between 5-10% better thanks to a smart management of resources) and dispatch (sending data) improved, thanks to which efficiency is improved conditional jumps (20%), increases efficiency by thread (over 25%) and reduces data not found in the instruction cache (30%), the latter thanks to instruction L1 caches and larger data and less latency.
These combined improvements bring an increase in performance by 30% in cycle execution integer calculations.
Focused on performance per watt
AMD made sure that many of the architectural improvements Steamroller are accompanied by lower consumption, a new floating point unit Flex-FP third generation increases performance and reduces power consumption by sharing resources between the processing unit x87/MMX with FMAC its two units of 128 bits, which in turn reduces the number of transistors used.
The new execution logic module allows all pipes are loaded with data chip, improving internal transfers, while performing better use of their resources, disabling / enabling the various units of the chip according to the application requirements (this includes sections of L2 cache), thanks to smarter use of all of them, providing the best possible performance and low power consumption.
Also includes dedicated hardware that monitors real-time transfers between the CPU and the GPU, further improving the efficiency of both components (important for future Steamroller based APUs).
Based products Steamroller
AMD announces future opteron microprocessors based APUs Steamroller, but in its presentation often mentions the great improvement in gaming performance with Steamroller, so it is certain that we shall also based APUs and CPUs Steamroller for desktops and laptops .
That comes later
We know Steamroller’s successor will be the future Excavator micro-architecture, which promises a performance boost as or more important than that will Steamroller, but unfortunately we will not have data on Excavator until at least next year.
—–Tags: amd, apu, architecture, consumption, decoder, dedicated, efficiency, fx, Improvements, micro-architecture, microprocessor, modular, opteron, performance, processor, specs, Steamroller