BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20251006T204224EDT-4335D3wnlu@132.216.98.100 DTSTAMP:20251007T004224Z DESCRIPTION:Title: On Mixture of Experts in Large-Scale Statistical Machine Learning Applications.\n\nAbstract:\n\nMixtures of experts (MoEs)\, a cla ss of statistical machine learning models that combine multiple models\, k nown as experts\, to form more complex and accurate models\, have been com bined into deep learning architectures to improve the ability of these arc hitectures and AI models to capture the heterogeneity of the data and to s cale up these architectures without increasing the computational cost. In mixtures of experts\, each expert specializes in a different aspect of the data\, which is then combined with a gating function to produce the final output. Therefore\, parameter and expert estimates play a crucial role by enabling statisticians and data scientists to articulate and make sense o f the diverse patterns present in the data. However\, the statistical beha viors of parameters and experts in a mixture of experts have remained unso lved\, which is due to the complex interaction between gating function and expert parameters.\n\nIn the first part of the talk\, we investigate the performance of the least squares estimators (LSE) under a deterministic Mo Es model where the data are sampled according to a regression model\, a se tting that has remained largely unexplored. We establish a condition calle d strong identifiability to characterize the convergence behavior of vario us types of expert functions. We demonstrate that the rates for estimating strongly identifiable experts\, namely the widely used feed-forward netwo rks with activation functions sigmoid(·) and tanh(·)\, are substantially f aster than those of polynomial experts\, which we show to exhibit a surpri sing slow estimation rate.\n\nIn the second part of the talk\, we show tha t the insights from theories shed light into understanding and improving i mportant practical applications in machine learning and artificial intelli gence (AI)\, in- cluding effectively scaling up massive AI models with sev eral billion parameters\, efficiently finetuning large-scale AI models for downstream tasks\, and enhancing the performance of Transformer model\, s tate-of-the-art deep learning architecture\, with a novel self-attention m echanism.\n\nSpeaker\n\nNhat Ho is currently an Assistant Professor of Sta tistics and Data Science at the University of Texas at Austin. He is a cor e member of the University of Texas\, Austin Machine Learning Laboratory a nd senior personnel of the Institute for Foundations of Machine Learning. He is currently associate editor of Electronic Journal of Statistics and a rea chair of ICML\, ICLR\, AISTATS\, etc. His current research focuses on the interplay of four principles of statistics and data science: (1) Heter ogeneity of complex data\, including mixture and hierarchical models and B ayesian nonparametrics\; (2) Stability and optimality of optimization and sampling algorithms for solving statistical machine learning models\; (3) Scalability and efficiency of optimal transport for machine learning and d eep learning applications\; (4) Interpretability\, efficiency\, and robust ness of massive and complex machine learning models.\n DTSTART:20241001T193000Z DTEND:20241001T203000Z LOCATION:Room 1104\, Burnside Hall\, CA\, QC\, Montreal\, H3A 0B9\, 805 rue Sherbrooke Ouest SUMMARY:Nhat Ho (University of Texas at Austin) URL:/mathstat/channels/event/nhat-ho-university-texas- austin-360728 END:VEVENT END:VCALENDAR