Updated Jul 3 2026 at 11:38 AM ET

Agentic engineering with Claude Code, Cursor, Codex, and friends

Strategy, founder/researcher interviews, and industry analysis

Government-gated AI rollouts
Last Week in AI 35m ago
a16z Goes Global: Why American Tech Must Lead the World
a16z 59m ago
I Tested Gemini Spark: What Googleโ€™s AI Agent Can Actually Do in 21 Minutes
Peter Yang 1h ago
Why Russia Never Stops Expanding - Sarah Paine
Dwarkesh Patel 17h ago
How a violent conqueror became the most beloved man in the city - Ada Palmer
Dwarkesh Patel 20h ago
Fable Is Back: Here's What You Should Try First
The AI Daily Brief 1d ago
Claude Fable 5 Is Finally Back: 5 Must-Try Use Cases Before July 7
Peter Yang 1d ago
AI That Discovers Math Will Also Explain It Better Than Us - Grant Sanderson
Dwarkesh Patel 1d ago
China's AI strategy is working
Peter Yang 1d ago
When the Government Says You Canโ€™t Deploy Your AI
Last Week in AI 2d ago
The reason Russia and China can't win at sea - Sarah Paine
Dwarkesh Patel 2d ago
How Big is the AI Economy
The AI Daily Brief 2d ago
Every employee should be a one-person startup
Peter Yang 3d ago
Mythos Returns But Not For Everyone
The AI Daily Brief 3d ago
Local Communities vs. National Security_ The Brutal AI Data Center Trade-Off
Last Week in AI 4d ago
Meet Your Ad Hoc AI Licensing Regime
The AI Daily Brief 5d ago
Jailbreaks Forever_ The Policy Endgame Nobody Wants
Last Week in AI 6d ago
India Can Create The Largest AI Companies
Y Combinator 6d ago
This is New Media
a16z 6d ago
Zynga Founder: Consumer Is Not Investible Right Now - Thats Why You Should Build It
Y Combinator 8d ago
Against all odds. Congratulations Elon Musk and SpaceX.
a16z 8d ago
How to Get Your First 10 Customers
Y Combinator 10d ago
Why Attention is Becoming a Competitive Advantage | a16z
a16z 11d ago

Latest cs.AI / cs.LG / cs.CL preprints from arXiv

Distributed Attacks in Persistent-State AI Control

As AI coding agents become more autonomous, they increasingly ship code iteratively, with the codebase persisting across sessions. This persistence creates a new attack surface: a misaligned or prompt-injected agent c...

Josh Hills, Ida Caspary, Asa Cooper Stickland cs.AI 21h ago
LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

LLMs memorize sensitive training data, including personally identifiable information (PII), creating a pressing need for reliable post hoc removal methods. Unlearning has emerged as a promising solution, with state-of...

Matteo Boglioni, Thibault Rousset, Siva Reddy et al. cs.CL 21h ago
Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Many everyday programming tasks resist clean rule-based implementation, such as alerting on important log lines, repairing malformed JSON, or ranking search results by intent, and are increasingly outsourced to large...

Wentao Zhang, Liliana Hotsko, Woojeong Kim et al. cs.LG 21h ago
Online Safety Monitoring for LLMs

Despite alignment training, LLMs remain prone to generating unsafe outputs at deployment time. Monitoring outputs online and raising an alarm when safety can no longer be assumed is therefore critical. We study a simp...

Mona Schirmer, Metod Jazbec, Alexander Timans et al. cs.AI 21h ago
ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they of...

Yanjun Zhao, Ruizhong Qiu, Tianxin Wei et al. cs.AI 21h ago
What LLM Agents Say When No One Is Watching: Social Structure and Latent Objective Emergence in Multi-Agent Debates

LLM agents will increasingly act in socially structured settings where role, audience, and relational context can shape what is advantageous or costly to say. We study whether such social structure, without any explic...

Arman Ghaffarizadeh, Danyal Mohaddes, Aliakbar Izadkhah et al. cs.AI 21h ago
Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Long-form TV dramas present a formidable challenge for comprehensive video understanding, where deciphering complex storyline often relies on \textbf{speaker recognition}, the task of accurately attributing each spoke...

Yuxuan Li, Lingxi Xie, Xinyue Huo et al. cs.CL 21h ago
DemoPSD: Disagreement-Modulated Policy Self-Distillation

On-policy self-distillation (OPSD) has emerged as a practical method for training large language models (LLMs) to reason, where a single model acts as both the teacher and the student with different levels of informat...

Yunhe Li, Hao Shi, Wenhao Liu et al. cs.LG 21h ago
Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Machine learning interatomic potentials (MLIPs) have become a hallmark of AI for scientific simulation. While efforts on new architectures and datasets have led to increasingly accurate and general models, the choice...

Gil Harari, Yoel Zimmermann, Ola Tangen Kulseng et al. cs.LG 21h ago
Controllable Sim Agents with Behavior Latents

Realistic traffic simulation requires agents that imitate logged behavior and can also be steered along interpretable axes. Such controllability enables engineers to isolate variables, reproduce specific edge cases, a...

Juanwu Lu, Junyu Zhu, Ziran Wang cs.RO 21h ago
Towards Robustness against Typographic Attack with Training-free Concept Localization

Models trained via Contrastive Language-Image Pretraining (CLIP) serve as the foundational vision encoders for most modern Large Vision Language Models (LVLMs). Despite their widespread adoption, CLIP models exhibit a...

Bohan Liu, Wenqian Ye, Guangzhi Xiong et al. cs.CV 21h ago
G-RRM: Guiding Symbolic Solvers with Recurrent Reasoning Models

In this work, we focus on SE-RRMs, a symbol-equivariant instantiation of RRMs that exhibits improved extrapolation to larger problem sizes. We propose a neuro-symbolic approach, ``Guiding with Recurrent Reasoning Mode...

Timo Bertram, Sidhant Bhavnani, Richard Freinschlag et al. cs.AI 21h ago
Visually Grounded Self-Reflection for Vision-Language Models via Reinforcement Learning

Large vision-language models can reason over multimodal inputs by generating textual chains of thought (CoT). A key capability exhibited in CoT reasoning is self-reflection: revisiting earlier decisions and correcting...

Liyan Tang, Fangcong Yin, Greg Durrett cs.CL 21h ago
Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

Visual token pruning is a crucial strategy for accelerating VLMs by compressing redundant image patches, yet existing methods often fail to preserve critical cues under dense instructions and fine-grained queries. In...

Xuehui Wang, Xuankun Yang, Wei Shen cs.CV 21h ago
Audio-Based Understanding of Audiobook Narration Appeal

Narration is central to the audiobook listening experience, shaping how listeners engage with and understand the content. This work explores how narration qualities shape an audiobook's appeal, noting that their effec...

Shahar Elisha, Mariano Beguerisse-Dรญaz, Emmanouil Benetos cs.CL 21h ago
TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from t...

Jiale Amber Wang, Kaiyuan Wang, Pengyu Nie cs.SE 22h ago
Human Capital, Not Model Benchmarks, Predicts Hybrid Intelligence in Forecasting

Whether pairing people with AI helps or hurts is usually reported as a single average effect. Using a real-money prediction market (Polymarket) as an objective, externally resolved benchmark, this pilot shows that the...

Vivienne Ming cs.CY 22h ago
Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Vision-Language-Action (VLA) models are fundamentally bottlenecked by the scarcity of expert demonstrations -- triplets of observations, instructions, and actions that are costly to collect at scale. We argue that thi...

Junhao Shi, Siyin Wang, Xiaopeng Yu et al. cs.RO 22h ago
Will Scaling Improve Social Simulation with LLMs?

Large Language Model (LLM) social simulations are a promising research method, but they are not yet faithful enough to be adopted widely. In this work, we investigate whether the current scaling paradigm in language m...

Caleb Ziems, William Held, Su Doga Karaca et al. cs.CL 22h ago
OrbitQuant: Data-Agnostic Quantization for Image and Video Diffusion Transformers

Diffusion transformers (DiTs) achieve state-of-the-art image and video generation, but their multi-step sampling and growing parameter count make inference expensive. Post-training quantization (PTQ) is the natural re...

Donghyun Lee, Jitesh Chavan, Duy Nguyen et al. cs.CV 22h ago
Neuron-Aware Data Selection for Annotation-Free LLM Self-Distillation

Post-training large language models (LLMs) without real-world interaction feedback or human-labeled supervision remains challenging, particularly in specialized domains where expert annotations are costly to obtain. R...

Zhuowei Chen, Xiang Lorraine Li cs.LG 22h ago
Language Models as Measurement Apparatus for Culture

Language models are increasingly used to quantify cultural phenomena, but what makes such measurement distinctively cultural? This paper argues that NLP work on culture is a material-discursive practice: the apparatus...

Kent K. Chang cs.CL 22h ago
Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data

Recent research has introduced distributed self-supervised learning (D-SSL) approaches to leverage vast amounts of unlabeled decentralized data. However, D-SSL faces the critical challenge of data heterogeneity, and t...

Xuanyu Chen, Nan Yang, Shuai Wang et al. cs.LG 22h ago
Optimal Stabilizer Testing and Learning with Limited Quantum Memory

We study stabilizer state testing and learning with limited coherent quantum memory. Here an algorithm sequentially receives copies of an unknown $n$-qubit state, but may keep only $k$ qubits of coherent quantum memor...

Srinivasan Arunachalam, Louis Schatzki quant-ph 22h ago
🖥️ NUC-Lab ยท Ollama v0.30.10 + Gemma 4 / Qwen 3.6 confirmed working on RTX 5070 Ti class hardware (ASUS NUC 15 Pro, 96 GB DDR5)