AMD Details Renoir: The Ryzen Mobile 4000 Series 7nm APU Uncoveredby Dr. Ian Cutress on March 16, 2020 11:00 AM EST
AMD’s latest Ryzen mobile product is the first design the company has done that combines CPU, GPU, and IO all on a monolithic die in TSMC’s 7nm process.
The CPU part of the design is very similar to what we’ve seen on the desktop: two quad core groups each with their own L3 cache shared between the cores. Compared to the desktop design, the mobile is listed as being ‘optimized for mobile’, primarily by the smaller L3 cache – only 4 MB per quad-core group, rather than the 32 MB per quad-core group we see on the desktop. While the smaller L3 cache might mean more trips out to main memory to get data, overall AMD sees it as saving both power and die area, with this level of cache being the right balance for a power limited chip.
Compared to the previous generation of Zen mobile processors, this generation on the CPU side of the equation comes with the 15% per-core iso-frequency improvement, down to the improvements at the heart of each core. We’ve covered these in detail in our desktop analysis. However for the mobile platform, not only is there a raw performance uplift, but we’re also seeing frequency uplift as well, moving from 4.0 GHz in the prior gen up to 4.3 GHz here. Actual workload performance AMD says gets a significant uplift due to the new power features we’ll discuss in due course.
On the GPU side is where we see bigger changes. AMD does two significant things here – it has reduced the maximum number of graphics compute units from 11 to 8, but also claims a +59% improvement in graphics performance per compute unit despite using the same Vega graphics architecture as in the prior generation. Overall, AMD says, this affords a peak compute throughput of 1.79 TFLOPS (FP32), up from 1.41 TFLOPS (FP32) on the previous generation, or a +27% increase overall.
AMD manages to improve the raw performance per compute unit through a number of changes to the design of the APU. Some of this is down to using 7nm, but some is down to design decisions, but it also requires a lot of work on the physical implementation side.
For example, the 25% higher peak graphics frequency (up from 1400 MHz to 1750 MHz) comes down a lot to physical implementation of the compute units. Part of the performance uplift is also due to memory bandwidth – the new Renoir design can support LPDDR4X-4266 at 68.3 GB/s, compared to DDR4-2400 at 38.4 GB/s. Most GPU designs need more memory bandwidth, especially APUs, so this will help drastically on that front.
There are also improvements in the data fabric. For GPUs, the data fabric is twice as wide, allowing for less overhead when bulk transferring data into the compute units. This technically increases idle power a little bit compared the previous design, however the move to 7nm easily takes that onboard. With less power overhead for bulk transfer data, this makes more power available to the GPU cores, which in turn means they can run at a higher frequency.
Coming to the Infinity Fabric, AMD has made significant power improvements here. One of the main ones is decoupling the frequency of Infinity Fabric from the frequency of the memory – AMD was able to do this because of the monolithic design, whereas in the chiplet design of the desktop processors, the fix between the two values has to be in place otherwise more die area would be needed to transverse the variable clock rates. This is also primarily the reason we’re not seeing chiplet based APUs at this time. However, the decoupling means that the IF can idle at a much lower frequency, saving power, or adjust to a relevant frequency to mix power and performance when under load.
Again we see the double bus width from the graphics to the engine pop up here, giving a better power-per-bit metric. But one of the key aspects from this graph is showing that the power consumed by the fabric in the new processors is very even across a wide bandwidth range compared to the older processor, where the voltages likely had to be stepped up as bandwidth increased, and introducing additional latency factors for performance. Luckily Renoir does away with this, and AMD are claiming a 75% better fabric efficiency compared to the previous generation.
Orthogonal to the raw improvements, AMD has also improved the media capabilities, with a new HDR/WCG encode engine for HEVC, which according to AMD should give a 31% encoding speedup when used.
Post Your CommentPlease log in or sign up to comment.
View All Comments
watzupken - Tuesday, March 17, 2020 - linkTo this point,
"Renoir has a huge range of load vs. power consumption: Please, please, PLEASE ensure that in all form factors users can make a choice of power consumption vs. battery life or cooling by setting max and sustained Wattage preferably at run-time and not hard-wiring this into distinct SKUs. I’d want a 15 Watt ultrabook to sustain a 35 Watt workload screaming its head off, just like I’d like a 90 Watt desktop or a 60 Watt NUC to calm down to 45/35/25 Watt sustained for night-long batches in the living room or bed-side—if that’s what suits my needs: It’s not a matter of technology, just a matter of ‘product placement’."
I doubt they will ever let you do that on a laptop or even NUC officially. The cooling solution implemented is usually very closely correlated to the TDP of the processor. Even when it is a downgrade from say a 45W to 35W, these are usually tightly controlled by AMD and Intel. There is no guarantee that all chips will work well at a certain clockspeed across various TDP. For example, a Ryzen 7 4800H may not be able to run at a 4800U speed when you reduce the TDP from 45W to 15W. U series chips are binned to be able to run at that the specific clockspeed and likely also commands a higher premium.
Tams80 - Tuesday, March 17, 2020 - linkThis. If you want to push your system, with the risk of damaging it, then you should be free to as long as there's no direct risk of it causing harm.
Namisecond - Thursday, March 26, 2020 - linkWhy are you using a laptop for a workstation task? If you have to hide it in a closet, then you really don't get to choose the where and when. Might as well rent out some hardware at a data center and do the work remotely. Not judging, just saying your way of doing things doesn't make a whole lot of sense to me.
yankeeDDL - Tuesday, March 17, 2020 - linkI have a question: let's take Ryzen 7 4800U and Ryzen 3 4300U devices.
Both are 15W parts, yet, one has 4/4 C/T and the other 8/16.
The 4300U has less GPU CU and lower clock.
How can they have the same TDP? Does this mean that the 4300U is likely to stay at Turbo a lot more consistently than the 4800U?
Spunjji - Tuesday, March 17, 2020 - linkPossibly. It will more likely mean (at least initially) that the 4300U is an inferior specimen and needs more voltage to reach its standard clock speeds.
yankeeDDL - Tuesday, March 17, 2020 - linkIt's possible. I guess also that's why they can increase the base clock on lower core count.
djayjp - Tuesday, March 17, 2020 - linkAMD: "Making the best graphics engine even better -- Vega 7nm" Surely they mean Navi...? Confused.
yankeeDDL - Tuesday, March 17, 2020 - linkNo, it is Vega architecture, ported to 7nm. With some improvements (as described in the article itself), but it is still Vega.
djayjp - Wednesday, March 18, 2020 - linkRight I get that but surely Navi is a better architecture than Vega (look at Radeon VII vs 5700 XT).
djayjp - Wednesday, March 18, 2020 - linkEspecially for mobile