AMD Details Renoir: The Ryzen Mobile 4000 Series 7nm APU Uncoveredby Dr. Ian Cutress on March 16, 2020 11:00 AM EST
AMD’s latest Ryzen mobile product is the first design the company has done that combines CPU, GPU, and IO all on a monolithic die in TSMC’s 7nm process.
The CPU part of the design is very similar to what we’ve seen on the desktop: two quad core groups each with their own L3 cache shared between the cores. Compared to the desktop design, the mobile is listed as being ‘optimized for mobile’, primarily by the smaller L3 cache – only 4 MB per quad-core group, rather than the 32 MB per quad-core group we see on the desktop. While the smaller L3 cache might mean more trips out to main memory to get data, overall AMD sees it as saving both power and die area, with this level of cache being the right balance for a power limited chip.
Compared to the previous generation of Zen mobile processors, this generation on the CPU side of the equation comes with the 15% per-core iso-frequency improvement, down to the improvements at the heart of each core. We’ve covered these in detail in our desktop analysis. However for the mobile platform, not only is there a raw performance uplift, but we’re also seeing frequency uplift as well, moving from 4.0 GHz in the prior gen up to 4.3 GHz here. Actual workload performance AMD says gets a significant uplift due to the new power features we’ll discuss in due course.
On the GPU side is where we see bigger changes. AMD does two significant things here – it has reduced the maximum number of graphics compute units from 11 to 8, but also claims a +59% improvement in graphics performance per compute unit despite using the same Vega graphics architecture as in the prior generation. Overall, AMD says, this affords a peak compute throughput of 1.79 TFLOPS (FP32), up from 1.41 TFLOPS (FP32) on the previous generation, or a +27% increase overall.
AMD manages to improve the raw performance per compute unit through a number of changes to the design of the APU. Some of this is down to using 7nm, but some is down to design decisions, but it also requires a lot of work on the physical implementation side.
For example, the 25% higher peak graphics frequency (up from 1400 MHz to 1750 MHz) comes down a lot to physical implementation of the compute units. Part of the performance uplift is also due to memory bandwidth – the new Renoir design can support LPDDR4X-4266 at 68.3 GB/s, compared to DDR4-2400 at 38.4 GB/s. Most GPU designs need more memory bandwidth, especially APUs, so this will help drastically on that front.
There are also improvements in the data fabric. For GPUs, the data fabric is twice as wide, allowing for less overhead when bulk transferring data into the compute units. This technically increases idle power a little bit compared the previous design, however the move to 7nm easily takes that onboard. With less power overhead for bulk transfer data, this makes more power available to the GPU cores, which in turn means they can run at a higher frequency.
Coming to the Infinity Fabric, AMD has made significant power improvements here. One of the main ones is decoupling the frequency of Infinity Fabric from the frequency of the memory – AMD was able to do this because of the monolithic design, whereas in the chiplet design of the desktop processors, the fix between the two values has to be in place otherwise more die area would be needed to transverse the variable clock rates. This is also primarily the reason we’re not seeing chiplet based APUs at this time. However, the decoupling means that the IF can idle at a much lower frequency, saving power, or adjust to a relevant frequency to mix power and performance when under load.
Again we see the double bus width from the graphics to the engine pop up here, giving a better power-per-bit metric. But one of the key aspects from this graph is showing that the power consumed by the fabric in the new processors is very even across a wide bandwidth range compared to the older processor, where the voltages likely had to be stepped up as bandwidth increased, and introducing additional latency factors for performance. Luckily Renoir does away with this, and AMD are claiming a 75% better fabric efficiency compared to the previous generation.
Orthogonal to the raw improvements, AMD has also improved the media capabilities, with a new HDR/WCG encode engine for HEVC, which according to AMD should give a 31% encoding speedup when used.
Post Your CommentPlease log in or sign up to comment.
View All Comments
eek2121 - Monday, March 16, 2020 - linkThis is an intriguing part. I am hoping for laptop designs with a 4800U and 5600M, but also desktop APUs. Hopefully AMD can bring some of the nee stuff forward to desktop Zen 3 as well.
heffeque - Monday, March 16, 2020 - linkIt would be interesting to see these in fan and fanless AMD versions of Surface Pro versus fan and fanless Intel versions of Surface Pro.
I'm especially interested in battery life, since AMD 3780U Surface Pro has horrible battery life compared to its Intel counter part.
The_Assimilator - Monday, March 16, 2020 - linkThe fact that OEMs are willing to make custom designs for AMD is already a good sign that they're confident in the product. Lisa Su certainly has the right stuff.
Khenglish - Monday, March 16, 2020 - linkI'm pretty unimpressed by the GPU vs the Vega 11 in APU desktops. The only major advantage Renoir has is higher clocks on the GPU core and higher officially supported memory speeds. They likely got the 56% performance per core improvement by comparing to a Zen+ with Vega 11, which will be severely clocked constrained on 12nm with a bigger core, where Renoir gets an even higher clock advantage not just from the nominal clock, but also from Picasso APUs hitting their TDP limit hard in a 25W or 35W environment.
On desktop with much higher TDPs I expect Renoir to slightly beat the 3400g at stock clocks, but lose when comparing overclocked results. Picasso easily overclocks up to 1700-1800 MHz from the measly 1240 MHz stock clock. I would guess Renoir would hit around 2000, not enough to compensate for the smaller core.
eek2121 - Tuesday, March 17, 2020 - linkThere are a lot of problems with your comment, but let’s start with the obvious: The TDP of the part you mentioned is at least triple that of the 4800U. Depending on how the chip is configured it is quadruple.
These are laptop parts, we haven’t seen desktop APUs. AMD could add 3X as many Vega cores and still hit a 45-65 watt TDP or they can go aggressive on the CPU clocks like they did the 4900H.
Spunjji - Tuesday, March 17, 2020 - linkI'm pretty sure the desktop APU won't have more Vega CUs.
tygrus - Tuesday, March 17, 2020 - linkThese days doubling the GPU cores/units and running half the speed is more energy efficient. Uses more die space but I don't understand the focus on GPU MHz over energy efficiency.
Spunjji - Tuesday, March 17, 2020 - link1) Not sure evidence I've seen bears out that a 1700-1800Mhz GPU overclock is "easy". That sounds like the higher end of what you can expect. Would welcome evidence to the contrary, as I'm still considering picking one up.
2) RAM speed is the big difference here. The desktop APU should get much higher memory speeds than the 3400G due to the improved Zen 2 memory controller, which ought to relieve a significant bottleneck. GPU core overclocks weren't actually the best route to wringing performance out of the 3400G.
Fataliity - Monday, March 16, 2020 - link@Ian Cuttress, did they say what version of N7 they used for this? The density looks like either a HPC + variant, or a N7 mobile variant from what I can tell?
abufrejoval - Monday, March 16, 2020 - linkWhat I want is choice. And flexibility to enable it.
15 Watt TDP typically isn’t a hard limit nor is 35 or 45 Watt for that matter: It’s mostly about what can be *sustained* for more than a second or two. Vendors have allowed bursting at twice or more TDP because that’s what often defines ‘user experience’ and sells like hotcakes on mobile i7’s.
We all know the silicon is the same. Yes, there be binning but a 15 Watt part sure won’t die at 35, 45 or even 65 or 95 Watts for that matter: It will just need more juice and cooling. And of course, a design for comfortable cooling of 15 Watts won’t take 25 or 35 Watts without a bit of ‘screaming’.
But why not give a choice, when noise matters less than a deadline and you don’t want to buy a distinct machine for a temporary project?
I admit to have run machine-learning on Nvidia equipped 15.4” slim-line notebooks for days if not weeks, and having to hide them in a closet, because nobody in the office could tolerate the noise they produced at >100 Watts of CPU and GPU power consumption: That’s fine, really, when you can choose what to do where and when.
Renoir has a huge range of load vs. power consumption: Please, please, PLEASE ensure that in all form factors users can make a choice of power consumption vs. battery life or cooling by setting max and sustained Wattage preferably at run-time and not hard-wiring this into distinct SKUs. I’d want a 15 Watt ultrabook to sustain a 35 Watt workload screaming its head off, just like I’d like a 90 Watt desktop or a 60 Watt NUC to calm down to 45/35/25 Watt sustained for night-long batches in the living room or bed-side—if that’s what suits my needs: It’s not a matter of technology, just a matter of ‘product placement’.