Jiwon Song

Ph.D. Student @ SNU VLSI Lab

Electrical and Computer Engineering, Seoul National University

Building efficient AI systems across models, hardware, and deployment.

Research Focus

Efficient AI systems across model optimization, hardware-aware acceleration, and practical deployment.

Compression

Quantization, pruning, sparsity, and token-efficient computation for scalable AI.

Acceleration

Hardware-aware methods that turn algorithmic efficiency into practical speedups.

Systems

Serving and deployment paths that reduce inference cost in real workloads.

News

Recent updates.

May 2026

Token Sparse Attention is accepted to ICML 2026.

Publications

Research on efficient inference, reasoning acceleration, and practical system optimization for large language models.

2026

Preprint

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Jiwon Song, Dongwon Jo, Beomseok Kang, Jae-Joon Kim

arXiv preprint arXiv:2605.16839

Paper Code

ICML 2026

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Dongwon Jo, Beomseok Kang, Jiwon Song, Jae-Joon Kim

The 43rd International Conference on Machine Learning (ICML 2026)

Paper Code

Preprint

RelayGen: Intra-Generation Model Switching for Efficient Reasoning

Jiwon Song, Yoongon Kim, Jae-Joon Kim

arXiv preprint arXiv:2602.06454

Paper Code

2025

Preprint

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Beomseok Kang, Jiwon Song, Jae-Joon Kim

arXiv preprint arXiv:2510.14211

Paper

NeurIPS 2025

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim

The 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025)

Paper Code

ACL 2026 Findings

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Dongwon Jo*, Jiwon Song*, Yulhwa Kim, Jae-Joon Kim

*Equal contribution

Findings of the Association for Computational Linguistics: ACL 2026

Paper Code

2024

ICML 2024

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song*, Kyungseok Oh*, Taesu Kim, Hyungjun Kim, Yulhwa Kim, Jae-Joon Kim

*Equal contribution

The 41st International Conference on Machine Learning (ICML 2024)

Paper Code

Education

2024 - Present

Ph.D. Student (Integrated M.S./Ph.D. Program)

Electrical and Computer Engineering, Seoul National University

Advisor: Jae-Joon Kim

2018 - 2024

Bachelor of Science

Electrical and Computer Engineering, Seoul National University

Summa Cum Laude 3.9/4.3

Experiences

SqueezeBits

Research Intern

2022 Summer

DNN quantization

Technical Writer

2024

Blog posts comparing vLLM and TensorRT-LLM
vLLM vs TensorRT-LLM #6: Weight-Only Quantization
vLLM vs TensorRT-LLM #8: KV Cache Quantization

VLSI Lab

Undergraduate Researcher

2021-2022, 2023-2024

Seoul National University

LLM quantization and pruning
Post-training quantization of vision transformers

Teaching

Teaching Assistant

2024 Spring

Dept. of Electrical and Computer Engineering, Seoul National University

430.207B 기초전자회로 및 실험 (Introduction to Electronic Circuits)

Military

Military Service

2020-2021

Republic of Korea Army

Sergeant