Skip to content
ATAI Today Brief
HomeNewsConceptsGuidesToolbox
AboutSubscribeUA
Subscribe

AI Today Brief

The daily AI-engineering brief. Built in public. EN · UA.

XTelegramLinkedInYouTubeRSS
NewsConceptsGuidesSubscribeAdvertiseAboutEditorial policyAI disclosurePrivacyTerms

© 2026 AI Today Brief. All rights reserved.

  1. Home/
  2. News/
  3. Models & research/
  4. Former Qwen Lead Junyang Lin Details Shift from Model Training to Agent Environments
Models & research

Former Qwen Lead Junyang Lin Details Shift from Model Training to Agent Environments

July 5, 2026· 6 min read
OKCurated by Oleksandr Kuzmenko, AI Product Engineer·Updated July 5, 2026·Sources cited on every story
AI-assisted · editor-reviewed·How we use AI
Former Qwen Lead Junyang Lin Details Shift from Model Training to Agent Environments

Junyang Lin, former technical lead of Alibaba's Qwen project, outlined the structural challenges of hybrid thinking models like Qwen3. He argues the industry is transitioning from reasoning thinking (like o1/DeepSeek-R1) to agentic thinking judged by closed-loop environmental actions.

Why it matters

Understanding the shift from training raw models to optimizing agentic environments is crucial for developers building autonomous, robust LLM applications that interact with real-world tools.

TL;DR

  • 01Former Qwen lead Junyang Lin argues the AI field is shifting from training models to training agents.
  • 02Agentic RL requires decoupling training from inference to prevent slow tool executions from stalling the GPU.
  • 03Qwen3 MoE models feature up to 235B parameters with 128 total experts (8 active per token) and a 128K context window.

Key facts

Qwen3 Context Window
128K (dense & MoE models), 32K (small dense models)
Qwen3 Parameter Sizes
0.6B to 235B parameters under Apache 2.0
Mixture of Experts Routing
Activates 8 out of 128 experts per token

Merging Thinking and Instruct Modes

Combining step-by-step reasoning (thinking mode) and instant direct responses (instruct mode) is notoriously difficult. Standard instruct models are rewarded for short, fast responses, whereas reasoning models are rewarded for extensive token usage. Careless model merging leads to bloated responses and degraded quality. Qwen3 addressed this challenge using a four-stage post-training pipeline, with hybrid thinking exposed directly in code via the enable_thinking flag to toggle modes.

Decoupling Agent Environments

In traditional reasoning reinforcement learning (RL), rollouts rely on self-contained trajectories with quick, verifiable mathematical or logical rewards. However, agentic RL relies on live environments with browsers, tool servers, and terminals. To prevent training loops from stalling while waiting for slow tool executions, developers must decouple training from inference, optimizing the stability and exploit-resistance of the environment itself.

Qwen3 Architecture Highlights

According to the architectural details disclosed by Lin, the Qwen3 Mixture of Experts (MoE) models scale up to 235B parameters, with 128 total experts, activating 8 experts per token. Small dense models tie input and output embeddings and run on a 32K context, whereas larger dense and MoE versions drop the tying and expand the native context window to 128K under the permissive Apache 2.0 license.

#Qwen3#QwQ-32B#Qwen2.5-Max
ShareShare on XShare on LinkedIn
← Previous storyIstota Personal AI Operating System Integrates with Nextcloud and Plain-Text Ledgers

Related stories

  • Models & researchLeveraging the Mistral AI Platform Beyond Standard Chatbot Integrations
  • Models & researchNVIDIA's ASPIRE framework distills validated coding agent fixes into reusable skills
  • Models & researchClaude Sonnet 5 Faces Criticism as Arena Users Report Downgrades

Email digest

Get the morning AI brief

One email a day — the stories that matter for engineers, founders and tech leads. Human-edited, with links to primary sources.

  • ✓120+ sources scanned daily
  • ✓Edited by a human
  • ✓1 email per day
  • ✓EN + UA

By subscribing you agree to the privacy policy.