Inference
Paper notes on real-time inference systems
- Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs 2026-05-21
π0-style flow-matching dVLA의 replanning latency를 lightweight draft와 flow-consistency verification으로 줄이는 speculative inference framework
Korean - DEFLECT: Delay-Robust Execution via Flow-matching Likelihood-Estimated Counterfactual Tuning for VLA Policies 2026-05-20
fresh observation에서 나온 action이 stale observation에서 나온 action보다 선호된다는 label-free preference pair를 이용해서 async VLA의 delay-robustness를 높이는 offline post-training 방법
KoreanWriting - OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism 2026-05-19
MoT VLA에서 action과 language task가 공유하는 observation KV cache를 통합 관리해 중복 prefill과 resource contention을 줄이고 action frequency와 language throughput을 동시에 높이는 inference system
Korean