BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20251006T204224EDT-4335D3wnlu@132.216.98.100
DTSTAMP:20251007T004224Z
DESCRIPTION:Title: On Mixture of Experts in Large-Scale Statistical Machine
  Learning Applications.\n\nAbstract:\n\nMixtures of experts (MoEs)\, a cla
 ss of statistical machine learning models that combine multiple models\, k
 nown as experts\, to form more complex and accurate models\, have been com
 bined into deep learning architectures to improve the ability of these arc
 hitectures and AI models to capture the heterogeneity of the data and to s
 cale up these architectures without increasing the computational cost. In 
 mixtures of experts\, each expert specializes in a different aspect of the
  data\, which is then combined with a gating function to produce the final
  output. Therefore\, parameter and expert estimates play a crucial role by
  enabling statisticians and data scientists to articulate and make sense o
 f the diverse patterns present in the data. However\, the statistical beha
 viors of parameters and experts in a mixture of experts have remained unso
 lved\, which is due to the complex interaction between gating function and
  expert parameters.\n\nIn the first part of the talk\, we investigate the 
 performance of the least squares estimators (LSE) under a deterministic Mo
 Es model where the data are sampled according to a regression model\, a se
 tting that has remained largely unexplored. We establish a condition calle
 d strong identifiability to characterize the convergence behavior of vario
 us types of expert functions. We demonstrate that the rates for estimating
  strongly identifiable experts\, namely the widely used feed-forward netwo
 rks with activation functions sigmoid(·) and tanh(·)\, are substantially f
 aster than those of polynomial experts\, which we show to exhibit a surpri
 sing slow estimation rate.\n\nIn the second part of the talk\, we show tha
 t the insights from theories shed light into understanding and improving i
 mportant practical applications in machine learning and artificial intelli
 gence (AI)\, in- cluding effectively scaling up massive AI models with sev
 eral billion parameters\, efficiently finetuning large-scale AI models for
  downstream tasks\, and enhancing the performance of Transformer model\, s
 tate-of-the-art deep learning architecture\, with a novel self-attention m
 echanism.\n\nSpeaker\n\nNhat Ho is currently an Assistant Professor of Sta
 tistics and Data Science at the University of Texas at Austin. He is a cor
 e member of the University of Texas\, Austin Machine Learning Laboratory a
 nd senior personnel of the Institute for Foundations of Machine Learning. 
 He is currently associate editor of Electronic Journal of Statistics and a
 rea chair of ICML\, ICLR\, AISTATS\, etc. His current research focuses on 
 the interplay of four principles of statistics and data science: (1) Heter
 ogeneity of complex data\, including mixture and hierarchical models and B
 ayesian nonparametrics\; (2) Stability and optimality of optimization and 
 sampling algorithms for solving statistical machine learning models\; (3) 
 Scalability and efficiency of optimal transport for machine learning and d
 eep learning applications\; (4) Interpretability\, efficiency\, and robust
 ness of massive and complex machine learning models.\n
DTSTART:20241001T193000Z
DTEND:20241001T203000Z
LOCATION:Room 1104\, Burnside Hall\, CA\, QC\, Montreal\, H3A 0B9\, 805 rue
  Sherbrooke Ouest
SUMMARY:Nhat Ho (University of Texas at Austin)
URL:/mathstat/channels/event/nhat-ho-university-texas-
 austin-360728
END:VEVENT
END:VCALENDAR