Tag:

reinforcement

Machine Learning
Runtime reinforcement: preventing “instruction decay” in long reference windows

by March 3, 2026

March 3, 2026

Author(s): Shreyash Shukla Originally published on Towards AI. Image Source: Google Gemini “Floating Brain” Problem In our previous articles, we discussed how to give the agent knowledge (graphs), vision (shapes), …

0 Facebook Twitter Pinterest Email
AI Tools
Forget Keyword Mimicry: ByteDance AI Maps Molecular Bonds in AI Reasoning to Stabilize Long Chain-Thought Performance and Reinforcement Learning (RL) Training

by February 22, 2026

February 22, 2026

ByteDance Seed recently released research that could change the way we build reasoning AI. For years, developers and AI researchers have struggled to ‘cold-start’ large language models (LLMs). Long Chain …

0 Facebook Twitter Pinterest Email
AI News
Kyutai releases Hibiki-Zero: A3B parameter simultaneous speech-to-speech translation model using GRPO reinforcement learning without any word-level aligned data

by February 13, 2026

February 13, 2026

Kyutai has released hibiki-zeroA new model for simultaneous speech-to-speech translation (S2ST) and speech-to-text translation (S2TT). The system translates the source speech into the target language in real time. It handles …

0 Facebook Twitter Pinterest Email
AI News
A coding implementation to train safety-critical reinforcement learning agents offline using d3rlpy and conservative Q-learning with fixed historical data

by February 4, 2026

February 4, 2026

In this tutorial, we build a security-critical reinforcement learning pipeline that learns from fully deterministic, offline data instead of live exploration. We design a custom environment, generate a behavior dataset …

0 Facebook Twitter Pinterest Email
AI Tools
Nuss Research releases NussCoder-14b: a competitive Olympiad programming model, post-trained on QUEN3-14b via reinforcement learning.

by January 19, 2026

January 19, 2026

Nous Research has introduced NousCoder-14B, a competitive Olympiad programming model that is trained on the Qwen3-14B using reinforcement learning (RL) with verifiable rewards. On the LiveCodeBench v6 benchmark, which covers …

0 Facebook Twitter Pinterest Email
AI Tools
Meet SETA: open source training reinforcement learning environment for terminal agents with 400 tasks and CAMEL toolkit

by January 11, 2026

January 11, 2026

What does the end-to-end stack look like for terminal agents when you combine structured toolkits, synthetic RL environments, and benchmark aligned evaluations? A team of researchers from CAMEL AI, Eigent …

0 Facebook Twitter Pinterest Email
AI Basics
Why even reinforcement learning can’t beat casinos (and why I created a simulation to prove it)

by January 2, 2026

January 2, 2026

Author(s): alopix Originally published on Towards AI. Mathematical and reinforcement learning tours through blackjack, poker, slot machines and roulette Casinos are one of the few environments where the rules are …

0 Facebook Twitter Pinterest Email
Generative AI
Liquid AI’s LFM2-2.6B-Exp uses pure reinforcement learning RL and dynamic hybrid reasoning to optimize small model behavior

by December 28, 2025

December 28, 2025

Liquid AI has introduced LFM2-2.6b-XP, an experimental checkpoint of its LFM2-2.6b language model trained with pure reinforcement learning on top of the existing LFM2 stack. The goal is simple, to …

0 Facebook Twitter Pinterest Email