Briefs
Short paper notes
- Elastic Queries Reinforcement Learning: Self-Aware Policy Execution for VLA Models 2026-06-15
frozen flow-based VLA는 그대로 둔 채, lightweight RL adaptor가 매 query마다 latent steering w, denoising steps K, execution chunk length C를 동적으로 선택해 hard state에서는 더 많은 compute와 잦은 replanning을, easy state에서는 낮은 compute와 긴 open-loop execution을 수행하도록 만드는 elastic VLA execution framework
Koreaninference-timesuccess-rateVLAscheduler-trainingauxiliary-module-training - ReactVLA: Fast and Lightweight Reactive Robot Manipulation via Improved Mean Flow Action Generation 2026-06-15
diffusion / flow 기반 VLA policy의 inference latency 병목을 줄이기 위해, action generation을 improved Mean Flow(iMF) 기반 one-to-few-step continuous action chunk generation으로 바꾸고 Attention Residuals(AttnRes) Transformer를 결합한 low-latency reactive robot manipulation policy
Koreaninference-timeVLAcomponent-scratch-training - WAM4D: Fast 4D World Action Model via Spatial Register Tokens 2026-06-15
4D geometry를 inference-time output으로 직접 만들지 않고, training-time spatial register token으로 future depth를 예측하게 만들어 geometric foundation prior를 causal video-action WAM에 distill한 뒤, deploy 시 geometry branch를 제거해 action chunk를 빠르게 생성
Koreansuccess-rateWAMfine-tuningauxiliary-module-trainingcomponent-scratch-training - µ0: A Scalable 3D Interaction-Trace World Model 2026-06-15
pretraining 단계에서는 action-labeled robot data 없이 heterogeneous videos에서 추출한 semantic 3D interaction traces를 학습하고, downstream에서는 frozen trace world model의 hidden features를 action expert에 주입해 robot policy를 만드는 3D trace-space world model
Koreansuccess-rateWAMfoundation-modeltraining-datacomponent-scratch-training - EgoEngine: From Egocentric Human Videos to High-Fidelity Dexterous Robot Demonstrations 2026-06-12
egocentric human manipulation video를 digital twin 기반으로 변환해, robot observation video와 실행 가능한 로봇 action trajectory를 함께 생성하고, 이를 이용해 real-robot dexterous visuomotor policy를 학습하는 human-video-to-robot-demo data engine
Koreansuccess-ratetraining-dataauxiliary-module-training - Improving Robotic Generalist Policies via Flow Reversal Steering 2026-06-12
coarse semantic action을 frozen flow-matching VLA의 역방향 ODE로 latent noise에 매핑한 뒤 다시 denoise해, generalist policy prior 안의 더 정교한 action mode를 호출하는 training-free steering 방법
Koreansuccess-rateinference-timeVLAauxiliary-module-trainingtraining-free - Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics 2026-06-11
suboptimal / OOD robot demonstrations를 Diffusion Policy 학습에 그냥 섞지 않고, diffusion timestep에 따라 “쓸 수 있는 구간”을 제한해 유용한 global plan 또는 local motion primitive만 뽑아 쓰는 imitation learning 방법
Koreansuccess-ratediffusion-policyscratch-trainingtraining-data - Dynamic Execution Horizon Prediction for Chunk-based Robot Policies 2026-06-11
pretrained action-chunking robot policy의 action generator는 완전히 고정하고, 현재 observation과 예측된 action chunk를 보고 “이번에 몇 step을 open-loop로 실행할지”를 PPO로 학습하는 lightweight execution-horizon predictor
Koreaninference-timesuccess-ratediffusion-policyscheduler-trainingauxiliary-module-training - Efficient-WAM: A 1B-Parameter World-Action Model with Low-Cost Future Imagination 2026-06-10
WAM의 미래 영상 예측을 photorealistic video generation이 아니라 action generation을 돕는 저비용 coarse future guidance로 재정의하고, compact video expert + low-resolution future latent + asymmetric video-action denoising으로 약 1B 규모에서 real-world policy inference latency를 약 98 ms/chunk까지 낮춤
Koreaninference-timesuccess-rateWAMfine-tuningcomponent-scratch-training - SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation 2026-06-10
long-horizon robotic manipulation에서 VLA policy의 self-improvement를 위해, action-primitive stage estimator와 multi-gate MoE value head로 dense reward/value model을 만들고, 이를 SPIRAL의 offline-to-online residual RL data flywheel에 통합한다
Koreansuccess-rateVLAfine-tuningauxiliary-module-trainingMoE - AHA-WAM: Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing 2026-06-09
Video-DiT world planner는 low-frequency로 long-horizon latent context를 만들고, Action-DiT executor는 OVCR로 최신 observation에 맞게 context를 보정해 short action chunk를 high-frequency closed-loop로 실행하는 asynchronous WAM
KoreanWAMinference-timesuccess-ratefine-tuningauxiliary-module-trainingcomponent-scratch-training - GEAR-VLA: Learning Geometry-Aware Action Representations for Generalizable Robotic Manipulation 2026-06-09
Qwen2.5-VL 기반 VLA에 latent action token K/V cache-conditioned stop-gradient DiT flow action expert, VGGT 기반 3D spatial encoder, relative end-effector action 기반 embodiment canonicalization을 결합해 unseen object / background shift / pretraining-unseen robot embodiment transfer를 개선하는 geometry-aware manipulation policy
Koreansuccess-rateVLAfine-tuningauxiliary-module-trainingcomponent-scratch-training - MotionWAM: Towards Foundation World Action Models for Real-Time Humanoid Loco-Manipulation 2026-06-09
Cosmos-Predict2.5 기반 Video DiT의 intermediate denoising feature를 Motion DiT action policy에 주입하고, SONIC 기반 unified whole-body motion token으로 humanoid의 상·하체를 한 action space에 묶어 Unitree G1에서 real-time loco-manipulation을 수행
Koreaninference-timesuccess-rateWAMfine-tuningcomponent-scratch-training - Q-VGM: Q-Guided Value-Gradient Matching for Flow-Matching VLA Policies 2026-06-09
few-shot SFT된 π0.5 flow-matching VLA를 고정된 self-rollout buffer와 learned Q-critic의 action-gradient로 offline RL fine-tuning하되, Q-gradient를 terminal action label이 아니라 denoising-time residual velocity supervision으로 바꾸어 학습
KoreanVLAsuccess-ratefine-tuningauxiliary-module-training - ActionMap: Robot Policy Learning via Voxel Action Heatmap 2026-06-08
VLA의 기존 single-point action decoder를 3D translation / 3D rotation / gripper voxel heatmap action head로 교체해, action space의 geometric proximity(인접성)를 학습 신호로 활용
Koreansuccess-rateVLAfine-tuningcomponent-scratch-training - Flash-WAM: Modality-Aware Distillation for World Action Models 2026-06-05
WAM의 video/action diffusion denoising을 각각의 noise regime에 맞게 다르게 distill해서, WAM을 거의 teacher 성능에 가깝게 유지하면서 real-time chunk-level control이 가능한 수준까지 가속하는 step-distillation method
KoreanWAMinference-timefine-tuningdistillation - 3DThinkVLA: Endowing Vision-Language-Action Models with Latent 3D Priors via 3D-Thinking-Guided Co-training 2026-06-04
pretrained VLA를 VLA data + real-world 3D reasoning data로 co-training하면서, 3D foundation model과 reasoning-prompt teacher를 학습 중에만 사용해 2D image-only inference에서도 implicit 3D spatial reasoning을 action prediction에 주입
KoreanVLAsuccess-ratefine-tuningauxiliary-module-trainingcomponent-scratch-training - GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors 2026-06-04
3D asset과 video foundation model prior를 이용해 humanoid loco-manipulation용 4D human-object interaction 데이터를 완전 디지털로 생성하고, 이를 Unitree G1용 tracking policy와 egocentric visual policy로 변환해 실제 로봇에 배포하는 data-generation / sim-to-real framework
Koreansuccess-ratetraining-datafine-tuningauxiliary-module-trainingsim2real - OSCAR: Omni-Embodiment Skeleton-Conditioned World Action Model for Robotics 2026-06-04
pretrained Cosmos-Predict2.5-2B video DiT를 2D kinematic skeleton condition으로 fine-tuning하여, 여러 robot embodiment와 human hand에 걸쳐 action-conditioned future video를 생성하고 이를 RoboArena policy evaluation proxy로 쓴다
KoreanWAMsuccess-ratefine-tuning - Cosmos 3: Omnimodal World Models for Physical AI 2026-06-03
language, image, video, audio, action을 하나의 Mixture-of-Transformers (MoT) 기반 omnimodal world model로 통합해, VLM·video generator·forward/inverse dynamics·robot policy를 하나의 Physical AI backbone으로 다루는 NVIDIA의 대규모 foundation model
KoreanWAMsuccess-ratefoundation-Model - Denoising Tells When to Replan: Denoising-Variance Adaptive Chunking for Flow-Based Robot Policies 2026-06-03
last denoising step들에서 clean-action estimate들의 variance를 future action별 stability proxy로 사용해, 안정적인 action prefix만 실행하고 고분산 구간 전에 replan
KoreanVLAinference-timetraining-free - PointAction: 3D Points as Universal Action Representations for Robot Control 2026-06-03
pretrained video diffusion model이 RGB뿐 아니라 temporally consistent XYZ pointmap까지 생성하게 만들고, 이 3D point dynamics를 embodiment-specific diffusion action decoder가 action chunk로 변환
KoreanWAMsuccess-ratefine-tuningcomponent-scratch-training - See Less, Specify More: Visual Evidence Budgets for Generalizable VLAs 2026-06-03
VLA executor가 coarse goal과 full image에서 “무엇을 할지/무엇을 볼지”를 스스로 추론하지 않도록 goal-preserving local language와 learned visual evidence budget을 함께 학습시키는 planner-executor VLA generalization framework
KoreanVLAsuccess-ratefine-tuning - Continuous Reasoning for Vision-Language-Action 2026-06-02
VLA의 reasoning을 자연어 CoT가 아니라, 다른 VLA instance도 consume할 수 있는 WAE-regularized Gaussian continuous reasoning interface로 정의
KoreanVLAsuccess-ratefine-tuning - PACE: Phase-Aware Chunk Execution for Robot Policies with Action Chunking 2026-06-02
action chunking robot policy에서 고정 execution horizon 대신, predicted action chunk의 low-speed valley를 phase boundary로 사용해 매 query마다 실행 길이를 동적으로 선택하는 training-free test-time execution 방법
KoreanVLAinference-timetraining-free - VLAMotor: Test-Guided Enhancement of Vision-Language-Action Models via Agent-Based Data Synthesis 2026-06-02
training distribution에서 멀고 서로 중복되지 않는 테스트 케이스로 VLA 실패를 적극적으로 찾고, 그 실패 trajectory를 VLM agent가 성공 trajectory로 고쳐 fine-tuning data로 쓰는 failure-driven VLA enhancement framework
KoreanVLAsuccess-ratefine-tuning - τ0-WM: A Unified Video-Action World Model for Robotic Manipulation 2026-06-02
action generation, video prediction, action-conditioned evaluation을 하나의 shared video diffusion backbone 위에서 통합한 manipulation framework
KoreanWAMsuccess-ratefoundation-Model