ML Collective

Deep Learning: Classics and Trends

Coming Up

Date	Presenter	Topic or Paper
2024.04.26	Damien Teney	Neural Redshift: Random Networks are not Random Functions
2024.05.03	Sunny Sanyal	Early Weight Averaging meets High Learning Rates for LLM Pre-training
2024.05.24	Qiyao Wei	Defining Expertise: Applications to Treatment Effect Estimation
2024.05.31	Rishabh Agarwal	Improving large language models using self-generated data
2024.06.14	Niloofar (Fatemeh) Mireshghallah	Can LLMs Keep a Secret? Membership Inference Attacks and Contextual Integrity for Language
2024.06.21	Sanae Lotfi	Non-Vacuous Generalization Bounds for Large Language Models
2024.06.28	Gowthami Somepalli	Memorization in Diffusion: A trilogy [1] [2] [3]

Deep Learning: Classics and Trends (DLCT) is a paper reading group run by Rosanne since 2018. The idea is simple: we learn about one paper per week, from a speaker who's kind enough to spend an hour telling us about it (usually their own work but not always the case).

We do not record, to create a safe space that allows “stupid questions”, which we found to be the best way to connect and learn.

Subscribe with your email to receive a weekly reminder from Rosanne containing the Zoom link to join the meeting. You could also find the meeting info from the MLC Events Calendar.

Nominate a speaker, a paper, or just share what you think. Self-nominations are welcome and encouraged! We also use this form to collect anonymous feedback.

If you are scheduled to give a talk, first of all: thank you! Now, this is what to expect.

Read more about its genesis story, underlying philosophy and random trivia on its original site.

Past Events

Date	Presenter	Topic or Paper
2024.04.12	Diganta Misra	Just Say the Name: Online Continual Learning with Category Names Only via Data Generation
2024.04.05	Richard Song	OmniPred: Training Language Models as Universal Regressors [Slides]
2024.03.29	Hao Chen	Understanding and Mitigating the Pre-training Noise on Downstream Tasks [Slides]
2024.03.22	Nathan Lambert	RewardBench: Evaluating Reward Models for Language Modeling [Slides]
2024.03.15	Bram Grooten	Efficient Focus for Autonomous Agents: on Generalization in Deep RL [1] [2] [Slides]
2024.03.08	Ildus Sadrtdinov	To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning [Slides]
2024.03.01	Muhammad Khalifa	GRACE: Discriminator-Guided Chain-of-Thought Reasoning [Slides]
2024.02.16	Hugh Chen	Algorithms to estimate Shapley value feature attributions [Slides]
2024.02.09	Noam Razin	Vanishing Gradients in Reinforcement Finetuning of Language Models [Slides]
2024.02.02	Olivia Wiles	Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning [Slides]
2024.01.26	Elan Rosenfeld	Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization [Slides] [Recording]
2024.01.19	Ayush Jain	Beyond 2D Foundation Models: Towards building unified vision foundation models for 2D and 3D modalities [Slides]
2024.01.12	Chandan Singh	Uniting Large Language Models and Decision Trees [1] [2] [Slides]
2024.01.05	Maksym Andriushchenko	Why Do We Need Weight Decay in Modern Deep Learning? [Slides]
2023.12.08	Hattie Zhou	What Algorithms can Transformers Learn? A Study in Length Generalization [Recording]
2023.12.01	Vidhisha Balachandran	Understanding and Mitigating Factual Inconsistencies in Language Generation [1] [2] [Slides]
2023.11.24	Damien Teney	Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup [Slides]
2023.11.17	Yaroslav Bulatov	Engineer's Guide to Impractical Research
2023.11.10	Laurence Aitchison	Deep kernel machines [1] [2] [Slides]
2023.11.03	Mingjie Sun	A Simple and Effective Pruning Approach for Large Language Models [Slides]
2023.10.27	Chunyuan Li	Multimodal Foundation Models: From Specialists to General-Purpose Assistants [Slides]
2023.10.20	Clare Lyle	Understanding Plasticity in Neural Networks [Slides]
2023.10.13	Saurabh Dash	Intriguing Properties of Quantization at Scale [Slides]
2023.10.06	Marta Garnelo	Exploring the Space of Key-Value-Query Models with Intention
2023.09.29	Sean O'Brien	Contrastive Decoding Improves Reasoning in Large Language Models [Slides]
2023.09.15	Evgenii Nikishin	Deep Reinforcement Learning with Plasticity Injection [Slides]
2023.08.25	Nathan Lambert	Reproducing Reinforcement Learning from Human Feedback [Slides]
2023.08.18	Rishabh Agarwal	How to Distill Your Autoregressive Model [Slides]
2023.08.11	Potsawee Manakul	SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models [Slides]
2023.08.04	Aryan Mehra	ReViewNet + Planning-oriented Autonomous Driving [Slides]
2023.07.28	Amanda Bertsch	Unlimiformer: Long-Range Transformers with Unlimited Length Input [Slides]
2023.07.21	Saurabh Garg	Benchmarks and tasks for domain adaptation under distributional shift [Recording]
2023.07.07	Jing Yu Koh	Generating Images with Multimodal Language Models [Slides]
2023.06.23	Tongzhou Wang	Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [Slides]
2023.06.16	A. Feder Cooper	Is My Prediction Arbitrary? Measuring Self-Consistency in Fair Classification [Slides]
2023.06.02	Austin Xu	HandsOff: Labeled Dataset Generation With No Additional Human Annotations
2023.05.26	Tian Jin	Pruning's Effect on Generalization Through the Lens of Training and Regularization
2023.05.19	Alexia Jolicoeur-Martineau	PopulAtion Parameter Averaging (PAPA) [Slides]
2023.04.28	Robert Lange	Discovering Evolution Strategies via Meta-Black-Box Optimization [Slides]
2023.04.21	Gargi Balasubramaniam	Augmented Language Models: a Survey [Slides] [Recording]
2023.04.14	Sung Min (Sam) Park	TRAK: Attributing Model Behavior at Scale [Slides]
2023.04.07	Muqeeth	Soft Merging of Experts with Adaptive Routing [Slides]
2023.03.31	Gowthami Somepalli	Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models [Slides]
2023.03.24	Rahim Entezari	The Role of Pre-training Data in Transfer Learning [Slides] [Recording]
2023.03.17	Annie S. Chen, Yoonho Lee	Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features [Slides]
2023.03.03	Keerthana Gopalakrishnan	Towards a Robotics Foundation Model [1] [3] [4] [5] [Slides] [Recording]
2023.02.24	Rishi Bommasani	Trustworthy Social Bias Measurement [Slides]
2023.02.17	Kartik Chandra, Audrey Xie	Gradient Descent: The Ultimate Optimizer [Slides]
2023.02.10	Micah Goldblum	Bridging the gap between deep learning theory and practice [Slides]
2023.02.03	Mengzhou Xia	Training Trajectories of Language Models Across Scales [Slides]
2023.01.27	Bill Peebles	Scalable Diffusion Models with Transformers [Slides]
2023.01.20	Hattie Zhou	Teaching Algorithmic Reasoning via In-context Learning [Slides]
2023.01.13	Rosanne Liu	Learning with Non-Data [Slides]
2022.12.16	Keller Jordan	REPAIR: REnormalizing Permuted Activations for Interpolation Repair [Slides]
2022.12.09	Shoaib Ahmed Siddiqui	Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics [Slides]
2022.11.18	Ben Poole	DreamFusion: Text-to-3D using 2D Diffusion
2022.11.11	Adam Dziedzic	Dataset Inference for Self-Supervised Models [Slides]
2022.11.04	Chris Rytting	Out of One, Many: Using Language Models to Simulate Human Samples [Slides] [Recording]
2022.10.28	Rishabh Agarwal	Beyond Tabula Rasa: Reincarnating Reinforcement Learning [Recording]
2022.10.21	Samuel Ainsworth	Git Re-Basin: Merging Models modulo Permutation Symmetries [Slides] [Recording]
2022.10.14	Vivek Natarajan	Self Supervised Learning for Medical Imaging [1] [2] [3] [Slides]
2022.10.07	Miguel Angel Bautista	GAUDI: A Neural Architect for Immersive 3D Scene Generation
2022.09.30	Mathilde Caron	Masked Siamese Networks for Label-Efficient Learning
2022.09.23	Tim Dettmers	LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale [Slides]
2022.09.16	Sanae Lotfi	Bayesian Model Selection, the Marginal Likelihood, and Generalization
2022.09.02	Yutian Chen	Towards Learning Universal Hyperparameter Optimizers with Transformers [Slides] [Recording]
2022.08.26	VPT team, MineDojo team, Evocraft team	Minecraft Team Jam [Recording]
2022.08.19	Vincent Sitzmann	Machine Learning for Inverse Graphics
2022.08.12	Christina Baek	Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift [Slides]
2022.08.05	Hossein Mobahi	Sharpness-aware minimization (SAM): Current State and Future Directions
2022.07.29	Pamela Mishkin	DALL-E 2 Series III: Safety and Mitigations Planning for DALL-E 2 [Recording]
2022.07.22	Felipe Such	DALL-E 2 Series II: Deploying DALLE-2 safely at scale [Recording]
2022.07.15	Aditya Ramesh	DALL-E 2 Series I: Manipulating Images with DALL-E 2 [Slides] [Recording]
2022.07.01	Pete Florence	Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language [Slides]
2022.06.17	Greg Yang	Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer [1] [2]
2022.06.10	Utku Evci	Beyond Static Network Architectures [1] [2]
2022.05.20	Evgenii Nikishin	The Primacy Bias in Deep Reinforcement Learning [Slides]
2022.05.06	Dan Zhang	A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
2022.04.22	Germán Kruszewski	Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs [Slides]
2022.04.15	Robert Lange	On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning [Slides]
2022.04.08	Jan Drgona	On the Stochastic Stability of Deep Markov Models [Slides]
2022.04.01	Hattie Zhou	Fortuitous Forgetting in Connectionist Networks [Slides]
2022.03.18	Bryan Bischof	Advancing mathematics by guiding human intuition with AI [Slides]
2022.03.11	Rahim Entezari	The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks [Slides]
2022.03.04	Jacob Hilton	WebGPT: Browser-assisted question-answering with human feedback [Slides]
2022.02.25	Saining Xie	A ConvNet for the 2020s [Slides]
2022.02.18	Karsten Kreis	Score-Based Generative Modeling with Critically-Damped Langevin Diffusion [Slides]
2022.02.11	Andrea Schioppa	Paying less Attention for Language Modeling and Generation [1] [2] [3] [4] [Slides]
2022.02.04	Arash Vahdat	Tackling the Generative Learning Trilemma with Accelerated Diffusion Models [1] [2] [Slides]
2022.01.21	Beidi Chen	Simple and Efficient Sparse Training for Neural Network Models [Slides]
2022.01.14	Albert Gu	Efficiently Modeling Long Sequences with Structured State Spaces [Slides]
2022.01.07	Charline Le Lan	On the Generalization of Representations in Reinforcement Learning [Slides]
2021.12.17	Kyunghyun Cho	Oversmoothing of <eos> in Neural Autoregressive Modeling [Slides]
2021.12.10	Becca Roelofs	Is ImageNet Solved?: Evaluating Machine Accuracy [Slides]
2021.12.03	David Patterson	How to Have a Bad Career [Slides]
2021.11.19	James Martens	Deep Kernel Shaping [Slides]
2021.11.12	Gautam Kamath	Differentially Private Fine-tuning of Language Models [Slides]
2021.11.05	Rishabh Agarwal	How to Avoid Fooling Ourselves in Deep RL research [Slides]
2021.10.29	Mitchell Wortsman	Robust fine-tuning of zero-shot models [Slides]
2021.10.15	Wuyang Chen	Neural Architecture Search on ImageNet in Four GPU Hours [Slides]
2021.10.08	Adit Radha	Simple, Fast, and Flexible Framework for Matrix Completion with Infinite Width Neural Networks [Slides]
2021.10.01	Ilan Price	Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset [Slides]
2021.09.24	Preetum Nakkiran	Distributional Generalization: A New Kind of Generalization [Slides]
2021.09.17	Stanislaw Jastrzebski	Reverse-engineering implicit regularization due to large learning rates in deep learning [1] [2] [Slides]
2021.09.10	Emilien Dupont	The curse of discretization and learning distributions of functions [1] [2] [Slides]
2021.08.27	Srishti Yadav	Fair Attribute Classification through Latent Space De-biasing [Slides]
2021.08.20	Jonathan Ho	Recent progress in diffusion models [1] [2] [3] [4] [5]
2021.08.13	Kale-ab Tessera	Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization [Slides]
2021.07.30	Jaehoon Lee	Dataset Meta-Learning from Kernel Ridge-Regression [1] [2] [Slides]
2021.07.16	Evgenii Nikishin	Learning What Matters: Beyond Maximum Likelihood in Model-Based RL [Slides]
2021.07.09	Aviral Kumar	Making Deep Reinforcement Learning Easier to Use: Alleviating Optimization and Tuning Challenges in Deep RL [1] [2] [3] [Slides]
2021.06.25	Josh Roy	To Infinite (Visual) Transfer and Beyond [Slides]
2021.06.18	Ben Mildenhall	Neural Volumetric Rendering: How NeRF Works [1] [2] [Slides]
2021.06.11	Preetum Nakkiran	The Deep Bootstrap: Good Online Learners are Good Offline Generalizers [Slides]
2021.06.04	Sharon Zhou	Evaluating the Disentanglement of Deep Generative Models through Manifold Topology [Slides]
2021.05.21	Richard Song	Closing the Sim-To-Real Gap with Evolutionary Meta-Learning [Slides]
2021.05.14	Rosanne Liu	A few papers I saw at ICLR 2021 [Slides]
2021.04.30	Angjoo Kanazawa	Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image [Slides]
2021.04.23	Corentin Tallec	Bootstrap your own latent: A new approach to self-supervised Learning [Slides]
2021.04.16	Pieter-Jan Hoedt, Frederik Kratzert	MC-LSTM: Adding mass conservation to RNNs [Slides]
2021.04.09	Lilian Weng	Asymmetric self-play for automatic goal discovery in robotic manipulation [Slides]
2021.04.02	Hady Elhasar, Muhammad Khalifa, Marc Dymetman	A Distributional Approach to Controlled Text Generation [Slides]
2021.03.26	Xinlei Chen	Exploring Simple Siamese Representation Learning and Beyond [Slides]
2021.03.19	Jay J. Thiagarajan	Improving Reliability and Generalization of Deep Models via Prediction Calibration [1] [2] [3] [Slides]
2021.03.12	Rohan Anil	Scalable Second Order Optimization for Deep Learning [Slides]
2021.03.05	Rishabh Agarwal	Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [Slides]
2021.02.26	Robin Tibor Schirrmeister, Polina Kirichenko	Understanding Semantic Anomaly Detection with Deep Convolutional Generative Networks [1] [2] [Slides]
2021.02.19	Krzysztof Choromanski	Rethinking Attention with Performers - Towards New Transformers' Revolution [Slides]
2021.02.12	Yang Song	Generative Modeling by Estimating Gradients of the Data Distribution [1] [2] [3] [Slides]
2021.02.05	Rosanne Liu	Unconventional ways of training neural networks and what they teach us about model capacity [Slides]
2021.01.29	Johannes Brandstetter	Hopfield Networks is All You Need [Slides]
2021.01.22	Andrey Malinin	Uncertainty Estimation with Prior Networks [1] [2] [3] [4] [Slides]
2021.01.15	Jong Wook Kim	Learning Transferable Visual Models From Natural Language Supervision [Slides]
2021.01.08	Liyuan Liu	Understanding the Difficulty of Training Transformers [Slides]
2020.12.18	Julien Cornebise	AI for Good and Ethics-Washing: a Self-Defense Primer [Slides]
2020.12.04	Rishabh Agarwal	How I Learned To Stop Worrying And Love Offline RL [Slides]
2020.11.20	Sachit Menon	PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models [Slides]
2020.11.13	Luke Metz	Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves [Slides]
2020.11.06	Karl Cobbe	Phasic Policy Gradient [Slides]
2020.10.30	Angelos Katharopoulos	Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [Slides]
2020.10.23	Ryan Lowe	Learning to Summarize with Human Feedback [Slides]
2020.10.16	Jason Lee	Latent Variable Models and Iterative Refinement for Non-Autoregressive Neural Machine Translation [1] [2] [3] [Slides]
2020.10.09	Utku Evci	Difficulty of Sparse Training and RigL [1] [2]
2020.10.02	Shrimai Prabhumoye	Controllable Text Generation: Should machines reflect the way humans interact in society? [1] [2] [3] [Slides]
2020.09.25	Sidak Pal Singh	Model Fusion via Optimal Transport [Slides]
2020.09.18	Katherine Ye	Penrose: From Mathematical Notation to Beautiful Diagrams [Slides]
2020.09.11	Jesse Mu	Compositional Explanations of Neurons [Slides]
2020.09.04	Yian Yin	Unequal effects of the COVID-19 pandemic on scientists
2020.08.28	Arianna Ornaghi	Gender Attitudes in the Judiciary: Evidence from U.S. Circuit Courts [Slides]
2020.08.21	Anna Goldie, Azalia Mirhoseini	Chip Placement with Deep Reinforcement Learning [Paper]
2020.08.14	Zhongqi Miao	Deep Learning and Realistic Datasets [1] [2] [Slides]
2020.07.31	Dan Hendrycks	Out-of-distribution robustness in computer vision and NLP [1] [2] [Slides]
2020.07.24	Ben Mann	Language Models are Few-Shot Learners [Slides]

...

More past events reside permanently on the original site.