Talks – PreTrain 2025

Invited Talks

PhD student
Imperial College London

From Language to Perception: Emergent Multimodal Reasoning in Foundation Models

Abstract. Large language models have demonstrated impressive reasoning capabilities, but much of this progress depends on supervision with Chain-of-Thought reasoning data. We show that strong reasoning ability can emerge even without such supervision, using a minimalist reinforcement-driven objective. Building on this insight, we extend our approach to include visual inputs and observe similar reasoning behaviors in multimodal settings. This suggests that reasoning is not tied to specific types of supervision or data formats, but can emerge naturally when the training process is properly aligned. These results point toward a new direction for developing unified and general-purpose models capable of reasoning across language and perception.

Bio. Che Liu is a fourth-year PhD student at Imperial College London, supervised by Dr. Rossella Arcucci and Dr. Wenjia Bai. His research focuses on multimodal learning across vision, physiological signals, and language, with applications in medicine. His work has been published in machine learning conferences such as ICML, NeurIPS, and ACL, as well as clinical journals including IEEE Transactions on Medical Imaging and NEJM AI. He has also undertaken research internships at AstraZeneca and Alibaba DAMO Academy, contributing to real-world, large-scale applications of multimodal learning in medical AI.

Dr Yifei Wang

Postdoctoral Researcher
MIT

Your Next-Token Prediction and Transformers Are Biased for Long-Context Modeling

Abstract. Next-token prediction and the Transformer architecture have long been the de facto standards for training language models. In this talk, we argue that both practices are inherently biased, and these biases become particularly pronounced under long context. Specifically, we identify the root causes behind (1) the discrepancy between next-token prediction performance (e.g., perplexity) and long-context benchmark scores, and (2) persistent position-bias phenomena in Transformers, such as lost-in-the-middle, attention sinks, and recency bias. Our analysis leads to principled training objectives and architectural insights that substantially improve LLMs’ performance in long-context settings. This talk is based on the following two recent papers:
1. What is Wrong with Perplexity for Long-context Language Modeling?. ICLR 2025.
2. On the Emergence of Position Bias in Transformers. ICML 2025.

Bio. Yifei Wang is a postdoctoral researcher at MIT CSAIL, working with Professor Stefanie Jegelka. His research focuses on the theoretical and algorithmic foundations of self-supervised learning, foundation models, and AI safety. His work has received four best-paper awards and has been featured by Anthropic and MIT. He served as an area chair for ICLR 2024 and ICLR 2025. Before MIT, Yifei earned a PhD in Applied Mathematics, a BS in Data Science, and a BA in Philosophy from Peking University.