What You Need to Know About Nvidia Pascal

Fast sync provides a smooth, tearfree, and low latency gaming experience.

Portrait of Tammy Strobel
Fast sync provides a smooth, tearfree, and low latency gaming experience.
We’re finally leaving 28nm behind.
We’re finally leaving 28nm behind.

NVIDIA unveiled the GeForce GTX 1080 and 1070 to much fanfare in early May. Both of these cards were practically sweating performance through their pores, but that’s really only part of the reason why they are so attractive.

A lot of what makes the cards so interesting is the introduction of a brand new GPU architecture, dubbed Pascal. Since Maxwell was first introduced in early 2014 – followed by second-generation cards later thatyear and in 2015 – Pascal is the first new architecture in quite a while.

PASCAL IS BASED ON THE 16NM PROCESS NODE

We’ve been stuck on 28nm for far too long, ever since mobile Fermi GPUs transitioned from 40nm to 28nm back in 2012. In tech years, that’s practically a century ago.

Thankfully, Pascal debuts on the smaller, and vastly more efficient 16nm process node, which means it consumes far less power while serving up a whole lot more performance.

We also aren’t talking about just a few tens of watts here. At load, the GeForce GTX 1080 consumes over 100 watts less power than a custom, overclocked version of the GeForce GTX 980 Ti, one of the previous generation’s most powerful cards.

PASCAL HAS BETTER SUPPORT FOR MULTIPLE SCREENS

Pascal is a brand new architecture all right, and this doesn’t just mean more cores or faster clock speeds. NVIDIA GPUs feature Polymorph Engines that handle things like tessellation and perspective correction, and Pascal’s Polymorph Engine 4.0 (Maxwell still used version 3.0) features a new component called a Simultaneous Multi-Projection (SMP) unit.

This enables a single graphics card to support multiple displays without any warping of the final image. This was previously a problem, because the GPU assumed the displays were lined up in a straight line, when they were actually tilted at an angle to wrap around the user. This created distortions, and resulted in visual bugs like broken lines across monitors.

The SMP unit addresses this by allowing the GPU to process geometry through 16 different preconfigured projections from a single viewpoint in one pass. The picture that needs to be rendered is broken up into segments, so the correct perspective can then be displayed on the corresponding monitor. Ultimately, you get better-looking multi-display setups with just one card. Before this, the solution was to install multiple GPUs and assign each monitor to a particular GPU.

Polymorph Engine 4.0 features a new SMP unit for multiple projections.
Polymorph Engine 4.0 features a new SMP unit for multiple projections.
PASCAL HAS A NEW VERSION OF GPU BOOST

With GPU Boost 3.0, NVIDIA now offers custom frequency offsets for individual voltage points, allowing the GPU to maximize the headroom beyond a fixed frequency offset. It is a natural evolution of GPU Boost 2.0 – while 2.0 could already dynamically vary the boost speed according to temperature and load, it remained limited to a fixed frequency offset across all voltages.The problem with this is that the resulting voltage/frequency curve isn’t necessarily in sync with what the GPU is really capable of, resulting in lost potential headroom.GPU Boost 3.0’s support for per voltage offsets corrects this. By adding an additional parameter to vary the frequency offset by – in this case voltage – any lost performance opportunity can be minimized.

GPU Boost 3.0 pushes the frequency offset closer to the maximum theoretical clock.
GPU Boost 3.0 pushes the frequency offset closer to the maximum theoretical clock.
PASCAL, MEET FAST SYNC

Pascal even has something to help less demanding titles run better. A card like the GeForce GTX 1080 would make short work of games like Counter-Strike: Global Offensive and introduce screen tearing because of the high frame rates. That’s where Fast Sync comes in to provide smoother, tear-free, and low latency gaming, in the tradition of V-Sync and G-Sync – except it has nothing to do with refresh rates.

Fast Sync tweaks the graphics pipeline and decouples the frame rendering and display stages from the rest of the pipeline. This allows the GPU to continue to work as fast as it can, and store excess frames temporarily in the GPU frame buffer. If V-Sync were on, the game engine would have to wait for the display to refresh, creating a backpressure of frames and increasing input latency. With Fast Sync, the GPU can render frames unhindered, so there is no backpressure and latency is reduced. But it also cherry picks frames from the frame buffer to display, so tearing is avoided as well.

Here’s a look at how Fast Sync stacks up against V-Sync.
Here’s a look at how Fast Sync stacks up against V-Sync.
DIRECTX 12 GAMES RUN BETTER ON PASCAL

Asynchronous compute is one of therare cases where NVIDIA is playing catch-up with AMD, but catched up it has. Workloads can be separated into graphics and compute tasks, and asynchronous compute describes the ability to process both types of tasks simultaneously.

This is a crucial feature in DirectX 12 games, and NVIDIA’s cards had lagged behind their AMD counterparts because of the lack of proper hardware support for asynchronous compute. Well, that’s no longer the case, thanks to something called dynamic load balancing, which allows either compute or graphics workloads to maximize use of the GPU’s resources if they are freed up. This translates into a decent performance bump in some of the latest DirectX 12 titles like Ashes of the Singularity and Hitman.

Dynamic load balancing maximizes performance by minimizing the time GPU resources spend idling (gray area).
Dynamic load balancing maximizes performance by minimizing the time GPU resources spend idling (gray area).