Qualcomm Previews Snapdragon X Elite SoC: Oryon CPU Starts in Laptops
by Ryan Smith on October 24, 2023 3:00 PM ESTWhile Qualcomm has become wildly successful in the Arm SoC market for Android smartphones, their efforts to parlay that into success in other markets has eluded them so far. The company has produced several generations of chips for Windows-on-Arm laptops, and while each has incrementally improved on matters, it’s not been enough to dislodge a highly dominant Intel. And while the lack of success of Windows-on-Arm is far from solely being Qualcomm’s fault – there’s a lot to be said for the OS and software – silicon has certainly played a part. To make serious inroads on the market, it’s not enough to produce incrementally better chips – Qualcomm needs to make a major leap in performance.
Now, after nearly three years of hard work, Qualcomm is getting ready to do just that. This morning, the company is previewing their upcoming Snapdragon X Elite SoC, their next-generation Arm SoC designed for Windows devices. Based on a brand-new Arm CPU core design from their Nuvia subsidiary dubbed “Oryon”, the Snapdragon X Elite is to be the tip of the iceberg for a new generation of Qualcom SoC designs. Not only is it the heart and soul of Qualcomm’s most important Windows-on-Arm SoC to date, but it will eventually be in smartphones and a whole lot more.
But we’re getting ahead of ourselves. For now let’s focus on the Snapdragon X Elite SoC and the Oryon cores underpinning it.
While this morning’s announcement from Qualcomm is far from a deep dive on the hardware, it’s our first look at what will be Qualcomm’s flagship SoC, and the new CPU cores within it. With a projected launch date of mid-2024, the first laptops based on the SoC are still several months away from hitting retail shelves – and about a year delayed overall. None the less, Qualcomm has finished their silicon development work, and with the chip’s specifications locked down, the company is now on to polishing things for a launch next year.
The Oryon CPU cores within the Snapdragon X Elite are the culmination of Qualcomm’s Nuvia acquisition from early 2021, and an even longer period of work for the Nuvia team. The ambition of the team, and the importance of the custom Arm architecture CPU cores, cannot be overstated. So the Snapdragon X Elite is going to be an interesting chip on multiple levels, as it sets the pace for the next generation of Qualcomm chip designs.
Snapdragon Compute (Windows-on-Arm) Silicon | ||||
AnandTech | Snapdragon X Elite | Snapdragon 8cx Gen 3 |
Snapdragon 8cx Gen 2 |
Snapdragon 8cx Gen 1 |
Prime Cores | 12x Oryon 3.80 GHz 2C Turbo: 4.3GHz |
4x C-X1 3.00 GHz |
4 x C-A76 3.15 GHz |
4 x C-A76 2.84 GHz |
Efficiency Cores | N/A | 4x C-A78 2.40 GHz |
4 x C-A55 1.80 GHz |
4 x C-A55 1.80 GHz |
GPU | Adreno SD X Elite 4.6 TFLOPS |
Adreno 8cx Gen 3 |
Adreno 690 | Adreno 680 |
NPU | Hexagon 45 TOPS (INT8) |
Hexagon 8cx Gen 3 15 TOPS |
Hexagon 690 9 TOPS |
Hexagon 690 9 TOPS |
Memory | 8 x 16-bit LPDDR5x-8533 136GB/sec |
8 x 16-bit LPDDR4x-4266 86.3 GB/sec |
8 x 16-bit LPDDR4x-4266 86.3 GB/sec |
8 x 16-bit LPDDR4x-4266 86.3 GB.sec |
Wi-Fi | Wi-FI 7 + BE 5.4 (Discrete) |
Wi-Fi 6E + BT 5.1 | Wi-Fi 6 + BT 5.1 | Wi-Fi 5 + BT 5.0 |
Modem | Snapdragon X65 (Discrete) |
Snapdragon X55/X62/X65 (Discrete) |
Snapdragon X55/X24 (Discrete) |
Snapdragon X24 (Discrete) |
Process | 4nm | Samsung 5LPE | TSMC N7 | TSMC N7 |
Starting with a high-level look at the chip, the Snapdragon X Elite is a high-performance SoC designed to power Windows-on-Arm laptops. Qualcomm isn’t listing any official TDPs, but the company has told us that the Elite is designed to scale across a “broad range” of thermal designs. Active cooling will be needed to get the most out of the Elite, but according to Qualcomm, passive/fanless designs are possible as well, and we should expect to see some retail devices designed as such.
Qualcomm is fabbing the chip on an unspecified 4nm process. Given their previous performance issues with Samsung’s 4nm line, it’s a very safe bet that they’re building this chip at TSMC – possibly using the N4P line. The silicon itself is a traditional monolithic die, so there is no use of chiplets or other advanced packaging here (though the wireless radios are discrete).
CPU: Oryon By The Dozen
The star of the show (if you’ll forgive the pun) is Oryon, Qualcomm’s new custom-designed Arm CPU core. Designed by the Nuvia team that Qualcomm acquired in 2021, Oryon is the first high-performance, fully-custom Arm CPU core created by Qualcomm in several years. And following multiple generations of lackluster Snapdragon Compute SoCs built out of Arm Cortex-A/X designs and functionally bigger versions of Qualcomm’s mobile SoCs, Oryon marks a major change in direction for Qualcomm.
Being that this is a preview, there are no significant architectural details to share on Oryon at this time. We don’t know the width, or various buffer sizes, execution ports, etc. But what we do know is that Qualcomm didn’t aim low with this SoC – the Nuvia team was working on a server-grade CPU core prior to their acquisition, and that kind of aggressive design has carried over into Oryon as well. Which, after all, was one of the major goals of Qualcomm’s acquisition, as they have desired a high performance CPU core to push them ahead of the other laptop (and eventually mobile) chip makers.
The Snapdragon X Elite SoC ships with 12 Oryon CPU cores – and that’s it. Unlike Qualcomm’s 8cx family of designs, there are no distinct “efficiency” and “performance” cores based on different microarchitectures; this is a homogenous CPU design, more akin to traditional PC processors. This means that Oryon needs to pull double duty, excelling in performance in heavy workloads without chewing up a bunch of power in light workloads.
The Oryon CPU cores are broken up into three clusters of 4 cores each. We’re still waiting on further technical details, of course, but it’s a safe assumption that each cluster is on its own power rail, so that unneeded clusters can be powered down when only a handful of cores are called for.
Just on this basis alone, Snapdragon X Elite looks like a far more potent performer than the 8CX chips it replaces. The 8cx Gen 3 offered just 4 performance cores (Cortex-X1) and another 4 eficiency cores (Cortex-A78), so Snapdragon X Elite will hit the streets with 50% more CPU cores never mind the higher performance of those cores. For a laptop chip, Qualcomm is throwing a lot of CPU cores at the matter.
With regards to clockspeeds, in an all-core turbo workload, all 12 Oryon CPU cores can hit run at up to 3.8GHz, power and thermal headroom permitting. Meanwhile in lighter workloads, the chip supports turboing up to 4.3GHz on 2 cores. Qualcomm’s slide on this matter shows a core from each cluster, but it’s unclear whether this is some kind of prime/favored core in action (where only certain cores are designed/validated for those speeds) or if it’s simply a stylistic choice.
Either way, Qualcomm is aiming to turbo to relatively high clockspeeds for their laptop chip, a notable distinction from their much more modestly clocked 8CX chips. While high clockspeeds alone do not make for a fast chip, one of the performance bottlenecks the 8CX chips were their pokey clockspeeds, so if Oryon offers as high an IPC rate as we suspect it will, then this would go a long way towards boosting Qualcomm’s CPU performance to compete with the industry’s strongest players.
Memory: 128-bit LPDDR5x
Feeding the beastly Oryon CPU cores (as well as the rest of the chip) is a 128-bit LPDDR5x memory bus. This is less remarkable than the CPU side of the chip, but it’s important to note all the same. With the previous 8CX chips only supporting LPDDR4x, this brings Qualcomm back to parity with the latest PC chips in terms of memory technology support. And with supported data rates as high as LPDDR5x-8533, this will give Qualcomm one of the fastest memory controllers on the market.
Qualcomm is also quoting a total of 42MB of cache in the system sitting between the various processor blocks and system memory. Given the explicit mention of “total cache”, this is almost certainly L2 + L3. Previous Qualcomm designs have offered a 6MB shared L3 (last level) cache. If that’s the case again here, then that would mean there’s 3MB of L2 cache available for each CPU core – or some permutation thereof.
GPU: Latest Generation Adreno
On the graphics side of matters, Snapdragon X Elite incorporates Qualcomm’s latest generation Adreno GPU. As is typical for Qualcomm in these matters, the company is saying virtually nothing about the architecture employed here, though it goes without saying that this is the latest and greatest iteration of Qualcomm’s in-house GPU design.
From a feature perspective, this is a DirectX 12-class GPU with ray tracing support, mirroring the capabilities Qualcomm introduced with last year’s Snapdragon 8 Gen 2 mobile SoC. Within the Windows ecosystem, it will almost certainly qualify as a DirectX 12 Ultimate (feature level 12_2) design.
Qualcomm is quoting a single throughput figure for the design: 4.6 TFLOPS at an unspecified bit depth/format (we’d guess FP32). Qualcomm has not previously disclosed similar figures for the 8CX chips, so it’s hard to say how this will compare. Or even how it will compare to other integrated GPUs, since there’s a lot more to real-world GPU performance than pure FLOPS.
The display controller portion of the GPU offers support for up to 4 DisplayPort displays. Besides an internal display for the laptop, it can drive a further 3 external displays (all DP 1.4), with one output being 5K capable, while the rest are 4K.
Finally, the SoC is getting Qualcomm’s latest video processing block (VPU) as well. This latest design not only support AV1 decoding, but in a first for a Qualcomm SoC, AV1 encoding as well.
NPU: Hitting Hard with Hexagon
Next to the use of Oryon CPU cores, Qualcomm’s other big bet with the Snapdragon X Elite SoC is on the AI/neural processing unit side of things with their latest generation Hexagon NPU. Qualcomm is expecting that AI use will continue to rapidly grow over the next few years, and that the next big push is going to be AI models running locally on users’ systems. So they have invested a significant amount of resources in bulking up their Hexagon NPU for this generation of chips (X Elite and 8 Gen 3).
The end result is a heavily revised NPU, which should greatly exceed the 8CX Gen 3’s NPU performance. Qualcomm is quoting 45 TOPS of performance here for modest precision INT8, whereas 8CX Gen 3 was previously quoted at 15 TOPS for an unspecified data format.
Unlike their CPU and GPU, Qualcomm is sharing some architectural details here about the NPU, and what they’ve done to boost its performance. The tensor accelerator block, used in the densest matrix math, is outright 2.5x faster than before. Backing that (and the rest of the NPU) is a 2x larger shared memory/cache (though Qualcomm is not disclosing the actual size). Qualcomm is targeting large language models (LLMs) in particular with this change, as these are notoriously memory bound; according to the company, the chip will have enough resources to run a 13 billion parameter Llama 2 model locally.
Qualcomm has also made some power delivery changes to help drive more performance/efficiency out of the NPU. The power-hungry tensor block is now on its own power rail, with the rest of the NPU sitting on a separate shared rail. The company has also made some further undisclosed improvements to how they handle micro-tiling of inferencing workloads, which directly impacts how well they can split up workloads to keep the various sub-blocks of the NPU as busy as possible while minimizing intermediate memory operations.
I/O: USB4, PCIe 4, & Discrete Wi-Fi 7
Rounding out the Snapdragon X Elite, let’s talk I/O.
For internal I/O, the SoC offers PCIe 4.0 connectivity for NVMe storage. Elsewhere, the company is using PCIe 3 to supply connectivity to their modem and Wi-Fi solutions. No mention has been made of whether there are any free PCIe lanes for further peripherals.
For external I/O, the SoC supports USB4. According to Qualcomm, it can drive up to 3 such Type-C ports, and there are also a pair of USB 3.2 Gen2 outputs, and a single USB 2.0 output for internal use.
As noted earlier, both Wi-Fi and the modem are discrete for this product. The chip is intended to be paired with Qualcomm’s FastConnect 7800 silicon in the form of an M.2 card. The 7800 is their latest-generation Wi-Fi 7 solution, with support for 4 spatial streams as well as Bluetooth 5.4. The modem pairing is the Snapdragon X65, a high-performance 5G modem which was also available for the 8CX Gen 3.
The fact that neither wireless system is integrated into the SoC is unusual for Qualcomm, but perhaps not too surprising since they want to bring the Elite to market ASAP. Integrating these modules would take further time, and as a laptop SoC, Qualcomm doesn’t need to be as space efficient. In any case, the official line from Qualcomm is that the discrete modem is for OEM flexibility – to give OEMs the option to either include a modem or not – though Qualcomm of course will be strongly encouraging OEMs to include one as a major feature differentiator of the platform.
Performance Claims
As we don’t have enough architectural details to make any meaningful performance projections, the best thing we have for now are Qualcomm’s vague comparisons to their competitors. This is also the closest thing Qualcomm has provided to energy efficiency data for the chip (though, as always, target clockspeeds for a SKU play a massive part there).
With 12 performance cores, Qualcomm is pushing hard on multi-threaded performance. In fact, multi-threaded performance is the only CPU performance comparisons Qualcomm makes, as there are no single-threaded comparisons to speak of. Make of that what you will.
Against what is implied to be an Intel 12 core mobile CPU design, Qualcomm is reporting that Snapdragon X Elite delivers 2x the multi-threaded performance in Geekbench 6. Or at iso-performance, they hit the same mark at one-third the power consumption.
Even against Intel’s best 14-core (H-class) chips, Qualcomm still reports that they lead by 60% in performance, and again are consuming one-third the power at iso-performance. Undoubtedly, a lot of this is down to the process node used, as TSMC N4 should be delivering a significant advantage over the Intel 7 process used on Intel’s current chips. This is also why the “moving target” aspect is so critical, as Snapdragon X Elite should be competing with the Intel 4 based Meteor Lake lineup by the time it launches next year.
More interesting, perhaps, is that Qualcomm is reporting a 50% multi-threaded performance advantage over an unspecified "Arm-based competitor,” This is meant to imply Apple, but depending on just how vague Qualcomm wishes to be, MediaTek does offer some Windows-on-Arm chips as well.
Qualcomm also expects to lead in GPU performance in 3DMark Wildlife Extreme. Which again, with a process node advantage and a tendency to build bigger iGPUs overall, is not surprising.
As always, these claims should be taken with a large grain of salt, especially for a platform that is still several months away from launching.
Snapdragon X Elite: Coming Mid-2024
Wrapping things up, Qualcomm is at this point putting the final touches on the Snapdragon X Elite. The company has deemed it one of their “most pivotal platform announcements in the company's recent history”, and for good reason. The Oryon CPU core being introduced here will eventually be at the heart of a good deal more products, so how competitive Oryon is will make or break Qualcomm’s next few generations of designs.
Devices based on the Snapdragon X Elite should be available in mid-2024. Which on that schedule, should see the Snapdragon X Elite competing against Intel’s Meteor Lake (Core Ultra) chips, AMD’s Phoenix chips (Ryzen Mobile 7000), and whatever the latest available iteration is of Apple’s M-series chips.
84 Comments
View All Comments
Mantion - Thursday, October 26, 2023 - link
Agreed this won't work well with windows and typical windows software. Android and Linux are the obvious choice. Wouldn't take much to make a linux laptop that runs windows apps emulated better than x86 laptops.techconc - Thursday, October 26, 2023 - link
I don't think you you realize how much of an improvement this is over existing Intel based solutions. Performance per watt is a very big deal for any mobile device, including laptops. It's not just about peak performance, it's about longer battery life, less heat and less fan noise. Apple's MacBooks have had an embarrassing lead in laptops for several years now. This is exactly what's needed to bring some level of parity to the PC market.dotjaz - Sunday, October 29, 2023 - link
Or so Qualcomm claims, for the entirety of its life cycle, software will be in compatibility mode. And that's not good news for either performance or power.shadowjk - Monday, October 30, 2023 - link
I'm sceptical whether battery life, heat, and fan noise is worse primarily because of the lack of availability of good chips. It seems to me it's more of a conscious decision by PC manufacturers, or lack of trying.I imagine the biggest advantage that Apple has is that they also control the OS, and have better ability to make sure the OS doesn't spend 2 hours on some "maintenance tasks" while running on battery power.
ChrisGX - Friday, October 27, 2023 - link
More ARM cores are shipped each year than cores based on any other instruction set. ARM is commonly found in powerful servers these days. The Fujitsu Fugaku supercomputer running on an ARM server chip sat at number 1 on the Top500 for a year or more very recently. Suspicions are irrelevant to actual computer performance.NextGen_Gamer - Tuesday, October 24, 2023 - link
For some reason, AnandTech is using the non-final slides from the presentation, whereas ArsTechnica does have the final ones, that show the actual model numbers of which competitor chips Qualcomm is comparing to. This includes the ARM slide, which explicitly states it is peak power 12-core Oyron vs Apple M2 peak power. Now, that isn't exactly a fair fight: 12-core Oyron is running at 50-Watts, while M2 is only 8-cores and running at 25-watts. A fairer fight is M2 Pro, also 12-cores (8P+4E) that is closer to 40-watts. And in that comparison, Oryon seems to be losing: it is roughly the same performance, for slightly more power, but again, Oryon is using 12 performance cores to match 8 performance & 4 efficiency cores of Apple's...brucethemoose - Tuesday, October 24, 2023 - link
I would bet the M2 Pro is a much more expensive system.NextGen_Gamer - Tuesday, October 24, 2023 - link
Price has never been a factor though when comparing CPU architectures, but only when talking about final products. Also, it says a lot that in the x86 competition, Qualcomm is happy to compare it to Intel 12- or 14-cores processors, but only an 8-core Apple chip - even though a 12-core Apple SoC also exists. I will say this: in the GPU department, Qualcomm might have a winner. Again, no idea why AnandTech doesn't have these slides, but Qualcomm shows it being 80% faster than AMD's Radeon 780M in the Ryzen 7940HS - that is pretty big. One downside though: it will only have DirectX 12 compatibility drivers at launch, no Vulkin support.Kangal - Friday, October 27, 2023 - link
The fact that it's shipping with DirectX v12 first is not a downside. That's the hardwork getting done first. Qualcomm has loads of experience with Vulkan on Snapdragon processors. So that will come sooner than later. But it's refreshing to hear their DX12 is done, and that's performance comparisons have been done against the industry leader.However, like all things, we will have to wait to get our hands on it, test it properly, then draw conclusions. It seems the direct comparisons will be against the AMD 7840u and M2 Max.
name99 - Thursday, October 26, 2023 - link
A different way to look at this is that they only provide a "single" memory controller (depending on how you slice these things up, but the equivalent of the M2's memory controller), rather than than "dual" memory controllers of an M2 Pro.You can view this as
- unbalanced for the amount of compute they provide? OR
- the iGPU is M2-class, not M2 Pro class? OR
- the new norm going forward for the ARM world will be more cores than M2?
It's unclear to me which of these is correct. There are indications (so, yes, very reliable!) that the M3 Pro/Max will have 16 "cores", for which the most obvious assumption is three 4-core P-clusters and one 4-core E-cluster. But another option is two 6-core P-clusters (there's no law that a cluster has to be 4 cores, and I'm unaware of any simulations that suggest, for example, the bandwidth of 4 cores to a shared L2 is high enough that 6 cores sharing that bandwidth would be a bad idea).
Which in turn opens up interesting options for M3.
Maybe it gets a single 6P-cluster, and Pro/Max get 2 6P-clusters? Or maybe M3 gets 2 4P-clusters (so it's at least 8+4 cores, if "12" is expected to be the new low-end norm going forward) and Pro/Max get 3 4P-clusters?