Intel releases neural texture compression SDK for game devs

Mon, 6th Apr 2026

Intel has released its Texture Set Neural Compression technology as a standalone software development kit for game developers working with texture compression.

The toolkit grew out of a research prototype shown by Intel Labs and now provides a defined software stack for encoding and decoding texture data with neural networks. It is intended to shrink texture assets on disk and reduce runtime memory use by compressing multiple texture channels into a single texture set.

How it works

At the centre of the system is a two-stage process. During encoding, a multi-layer perceptron is trained with stochastic gradient descent while the source texture data is mapped to a lower-dimensional latent representation.

The latent data is then stored in the widely supported BC1 block compression format. The encoded values are arranged in a four-texture feature pyramid with mip chains, using a 16-channel data type relative to the original input images.

This structure lets developers combine diffuse RGB channels, ambient occlusion, roughness, metallic values, normals, and additional channels such as alpha values, masks, or precomputed data into one compressed texture set. To reconstruct the texture, the decoder uses a three-layer feed-forward multi-layer perceptron with a 16-neuron input layer, a 64-neuron hidden layer, and a 16-neuron output layer.

Tooling changes

The SDK has been rebuilt around Slang Pi and Slang compute shaders, allowing output in HLSL or SPIR-V shader formats. The decompressor interface is split across three main HLSL files that can compile to C, C++, or HLSL.

The software also integrates Microsoft's linear algebra extension for HLSL, allowing compatible Intel graphics processors to use XMX cores for inferencing. For systems without XMX support, including older graphics hardware or CPU-based workloads, the SDK includes a fallback path based on fused multiply-and-add operations with float16 arrays.

Developers can choose different shader permutations and activation functions. Relu is the default, while CU and GLU are alternatives that produce better results for high dynamic range values.

Intel provided one performance comparison for the hardware path: "Using linear algebra we can show a significant speed up in inference time of about 3.4x versus the FMA implementation. This demonstrates a significant performance savings and minimal overhead when running on Intel GPUs with XMX cores," it said.

Compression options

The SDK includes two main feature pyramid layouts that trade image quality against storage reduction. In Variant A, latent values are stored in two full-resolution BC1 images and two half-resolution BC1 images.

In tests on 4K uncompressed bitmap textures of about 64 MB each, the top two levels were reduced to 10.7 MB and the bottom two to 2.7 MB. That equals a 9x compression ratio, which Intel said is close to double standard BC compression with limited visual loss.

Variant B pushes compression further by using one full-resolution image followed by 1:2, 1:4, and 1:8 reduced images. Intel said the smallest level fell to 0.17 MB and the overall configuration delivered a 17x compression ratio, though it also produced more visible degradation and BC1 artefacts.

Quality trade-off

To assess output quality, Intel used NVIDIA Labs' FLIP analysis tool alongside traditional image metrics such as peak signal-to-noise ratio. Based on close-up views of a rendered 4K model on a 4K monitor, Variant A showed about 5% perceptual loss, according to the company.

Variant B showed between 6% and 7% perceptual loss in the same analysis, representing a noticeable shift in error metrics for viewers, Intel said.

Deployment choices

The software supports four deployment models for different game engine designs. In an offline distribution model, textures are compressed for delivery and then fully decompressed during installation so they sit uncompressed on the user's drive.

In a load-time model, textures remain compressed on disk and are decompressed into uncompressed video memory when a game loads. A stream-time model decompresses texels as needed during texture streaming, while a sample-time model keeps textures compressed on disk and in memory and performs decompression per pixel during rendering.

Intel also disclosed benchmark figures from a Panther Lake laptop with B390 integrated graphics. In a 1920x1080 full-screen compute shader test, the fused multiply-and-add implementation averaged 0.661 nanoseconds per pixel, while the linear algebra implementation averaged 0.194 nanoseconds per pixel.

Those figures underpin the company's cited 3.4x inference improvement and suggest the hardware-assisted route is designed to keep decoding overhead low on supported Intel graphics processors.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google