Interview: From CUDA to tile-based programming: NVIDIA’s Stephen Jones on building the future of AI

by
0 comments
Interview: From CUDA to tile-based programming: NVIDIA's Stephen Jones on building the future of AI

As the complexity of AI models increases and hardware evolves to meet demand, the software layer connecting the two must also adapt. we sat together recently Stephen Jones, a distinguished engineer at NVIDIA and one of the original architects of CUDA.

Jones, whose background spans fluid mechanics to aerospace engineering, provided deep insight NVIDIA’s latest software innovationsIncluding the shift towards tile-based programming, the introduction of the “Green Context” and how AI is rewriting the rules of code development.

Following are the highlights of our conversation.

Shift to tile-based abstraction

For years, CUDA programming has revolved around a hierarchy of grids, blocks, and threads. With the latest update, NVIDIA is introducing a higher level of abstraction: cuda tile,

According to Jones, this new approach allows developers to program directly on arrays and tensors rather than having to manage separate threads. “It extends existing CUDA,” Jones explained. “What we’ve done is we’ve added a way to directly talk about and program arrays of data, tensors, vectors… allowing the language and the compiler to see what high-level data you’re working on has opened up a whole area of ​​new optimizations”.

This change is partly a response to the rapid evolution of hardware. As tensor cores become larger and denser to combat the slowness of Moore’s Law, mapping code to silicon becomes increasingly complex.

  • Future-Proofing: Jones notes that by expressing the program as vector operations (for example, tensor A times tensor B), the compiler does the heavy lifting of mapping data to specific hardware generation.
  • Stability: This ensures that the program structure remains stable even if the underlying GPU architecture changes from Ampere to Hopper to Blackwell.

Python first, but not only Python

Recognizing that Python has become the language of Artificial Intelligence, NVIDIA launched First CUDA tile support with Python“Python is the language of AI,” Jones said, adding that the array-based representation is “much more natural for Python programmers” who are accustomed to NumPy,

However, performance purists need not worry. C++ support is coming next year, which maintains NVIDIA’s philosophy that developers should be able to speed up their code regardless of the language they choose.

“Green context” and reducing latency

For engineers deploying large language models (LLM) in production, latency and jitter are significant concerns. Jones highlighted a new feature called green referenceWhich allows precise segmentation of the GPU.

“The green context lets you divide the GPU into different sections,” Jones said. This allows developers to dedicate specific fractions of the GPU to different tasks, such as running pre-fill and decode operations simultaneously, without competing for resources. This micro-level specialization within a single GPU mirrors the separation observed at data center scale.

No black box: importance of tooling

One of the widespread fears regarding high-level abstractions is loss of control. Jones draws on his experience as a CUDA user in the aerospace industry to highlight this NVIDIA devices will never be black boxes.

“I really believe the most important part of CUDA is the developer tools,” Jones confirmed. He assured developers that even when using tile-based abstractions, tools like Nsite Compute would allow inspection down to individual machine language instructions and registers. He added, “You have to be able to tune, debug, and optimize… It can’t be a black box.”

accelerate time-to-results

Ultimately, the goal of these updates is productivity. Jones described the objective as “shifting the performance curve left”, enabling developers to reach 80% potential performance in a fraction of the time.

“If you can get to market with 80% performance in a week instead of a month… then you’re spending the rest of your time just optimizing,” Jones explained. Importantly, this ease of use does not come at the expense of power; The new model still offers a path to 100% of the peak performance that silicon can offer.

conclusion

As AI algorithms and scientific computing converge, NVIDIA is positioning CUDA not only as a low-level tool for hardware experts, but also as a flexible platform that adapts to the needs of Python developers and HPC researchers alike. With support from Ampere to upcoming Blackwell and Rubin architectures, these updates promise to streamline development across the entire GPU ecosystem.

For full technical details on CUDA Tiles and Green Context, go here NVIDIA Developer Portal.


Jean-Marc is a successful AI business executive. He leads and accelerates development of AI driven solutions and started a computer vision company in 2006. He is a recognized speaker at AI conferences and holds an MBA from Stanford.

Related Articles

Leave a Comment