[ds-undergrads] Data Science talk by Prof. Samet Oymak, Friday May 6th, 12-1pm, MRB Seminar Room]

Sun May 1 14:23:10 PDT 2022

Please see below information about the DS Seminar this Friday.
vt

---------------------------- Original Message ----------------------------
Subject: [UCR_DataScience] Data Science talk by Prof. Samet Oymak, Friday
May 6th, 12-1pm, MRB Seminar Room
From:    tsotras at cs.ucr.edu
Date:    Sat, April 30, 2022 2:31 pm
To:      datascience at lists.ucr.edu
--------------------------------------------------------------------------

The next Data Science Seminar will be this coming Friday May 6th, 2022,
from 12:00-1:00pm  at the MRB Seminar Room (1st floor).

**** Pizza and refreshments will be provided ****

To keep track of the number of attendees, please *register* at:
https://www.eventbrite.com/e/data-science-talk-tickets-331275533037

The talk will be given by Prof. Samet Oymak, Department of Electrical and
Computer Engineering, UCR

Title: Understanding Large ML Models through the Structure of Feature
Covariance

Abstract:

An overarching goal in machine learning is to enable accurate statistical
inference in the setting where the sample size is less than the number of
parameters. This overparameterized setting is particularly common in deep
learning where it is typical to train large neural nets with relatively
smaller sample sizes and little concern of overfitting. In this talk, we
highlight how structure within data is a catalyst for the empirical
success of these large models. After linking deep nets to linear models,
we show that the eigen-structure of the feature covariance can help
explain empirical phenomena such as noise robustness, double descent
curve, model compression, and the benefits of perfectly-fitting to the
training data. In particular, we highlight that a typical feature
covariance has a spiked structure with few large eigenvalues and many
smaller ones. We proceed to discuss: (1) For data with label noise:
Regularization is useful to restrict the optimization process to large
eigen-directions and reduce overfitting, and (2) For (mostly) noiseless
data: Incorporating small eigen-directions is crucial for striking a good
bias/variance tradeoff. This in turn explains why larger models work
better despite perfect-fitting with no regularization. Finally, we explain
how our high-dimensional analysis framework based on gaussian process
theory facilitates these findings.

------------------------------------
Sponsored by the UCR Data Science Center, the purpose of the Data Science
talks is to foster collaborations between "core" Data  Science faculty
(from CSE/ECE/Stat Departments) and faculty/visitors from other sciences
that face Data Science problems in their research. These informal
gatherings are open to interested faculty and graduate students. Each
meeting will start with a talk describing research problems and then a
discussion will follow for questions, open problems, ideas for possible
collaborations etc.

A full list of previous seminars appears at:
http://datascience.ucr.edu/news

Please forward this email to other colleagues or graduate students in your
lab that may be interested.

Moreover, if you are interested in giving a Data Science related talk,
please contact me.

Sincerely,
Vassilis Tsotras
Professor, CSE Department
Director, Data Science Major

_______________________________________________
DataScience mailing list
DataScience at lists.ucr.edu
https://lists.ucr.edu/mailman/listinfo/datascience

_______________________________________________
DataScience mailing list
DataScience at lists.ucr.edu
https://lists.ucr.edu/mailman/listinfo/datascience