L1 loss gradient, explained from scratch

by ai-intensify April 10, 2026 0 comments

Last updated by Editorial Team on April 10, 2026

Author(s): Utkarsh Mittal

Originally published on Towards AI.

A complete, step-by-step walkthrough of how gradient descent with full-value loss works – with diagrams you can actually follow.

If you’ve ever read a deep learning tutorial and encountered a derivative that appeared out of nowhere, this article is for you. We’re going to break down one of the simplest – yet most instructive – gradient calculations in machine learning: the gradient of the L1 (full-value) loss with respect to a single weight.

Our concrete example uses these values:

The article explains the sequential computation of the L1 loss through a structured approach, starting from a simple regression model and discussing its components, the loss function, and how to obtain the gradient with respect to the weights. It emphasizes clarity using concrete examples and progressively builds understanding through the chain rule in calculus. The summary concludes by comparing the insensitivity of the L1 loss to outliers and the responsiveness of the L2 loss to error magnitude, ultimately providing guidance on when to use each loss function effectively.

Read the entire blog for free on Medium.

Published via Towards AI

L1 loss gradient, explained from scratch

Author(s): Utkarsh Mittal

A complete, step-by-step walkthrough of how gradient descent with full-value loss works – with diagrams you can actually follow.

We build enterprise-grade AI. We will also teach you how to master it.

Measuring and bridging the realism gap in user simulators

From idea to POC in just a few days

Related Articles

Leave a Comment Cancel Reply