Department of Industrial Engineering & Decision Analytics [Joint IEDA/ISOM] seminar - On the Width Scaling of Neural Optimizers: A Matrix Operator Norm Perspective
Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:
A central question in modern deep learning and language models is how to design optimizers whose performance scales favorably with network width. We address this question by viewing neural-network optimizers such as AdamW and Muon through a unified lens, as instances of steepest descent under matrix operator norms. Within this framework, we align the optimizer geometry with the Lipschitz structure of the network’s forward map, impose a requirement of layerwise composability, and show that standard p→q operator-norm steepest-descent rules generally fail to compose across layers. To overcome this limitation, we introduce a family of matrix operator norm geometries (p, mean)→(q, mean) that admit closed-form, layerwise descent directions and yield practical optimizers such as a rescaled AdamW, row normalization, and column normalization. By construction, our rescaling recovers μP-style width scaling as a special case and provides predictable learning-rate transfer across widths for a broader class of optimizers. We further prove that the induced descent directions preserve standard convergence guarantees and achieve near width-insensitive smoothness for mappings (1, mean)→(q, mean) with q ≥ 2 and (p, mean)→(infty, mean), where smoothness is measured in the corresponding matrix-norm geometry. Finally, this optimizer achieves improved width scaling compared with Muon, and that Muon in turn outperforms AdamW, suggesting a principled and practical route for mitigating dimensional dependence in large-scale optimization.
Jiajin Li is a tenure-track Assistant Professor in the Operations & Logistics Division at the Sauder School of Business, University of British Columbia. She is also an associated faculty member of the Institute of Applied Mathematics (IAM) and the Department of Computer Science at UBC. Prior to joining UBC, she spent three years as a postdoctoral researcher in the Department of Management Science and Engineering (MS&E) at Stanford University. She received her Ph.D. in Systems Engineering and Engineering Management from the Chinese University of Hong Kong (CUHK) in 2021.