Deep Learning: Classics and Trends

DLCT group img

Coming Up

Date Presenter Topic or Paper
2024.10.18 Joseph Miller Transformer Circuit Faithfulness Metrics Are Not Robust
2024.10.25 Lunjun Zhang Generative Verifiers: Reward Modeling as Next-Token Prediction
2024.11.01 Jack Morris Contextual Document Embeddings
2024.11.08 Divyam Madaan A Framework for Multi-modal Learning: Jointly Modeling Inter- & Intra-Modality Dependencies
2024.11.15 John Hewitt Instruction Following without Instruction Tuning
2024.11.22 Rajat Modi On Asynchronous Perception Machine for Test-Time-Training


Deep Learning: Classics and Trends (DLCT) is a paper reading group run by Rosanne since 2018. The idea is simple: we learn about one paper per week, from a speaker who's kind enough to spend an hour telling us about it (usually their own work but not always the case).

We do not record, to create a safe space that allows “stupid questions”, which we found to be the best way to connect and learn.

Subscribe with your email to receive a weekly reminder from Rosanne containing the Zoom link to join the meeting. You could also find the meeting info from the MLC Events Calendar.

Nominate a speaker, a paper, or just share what you think. Self-nominations are welcome and encouraged! We also use this form to collect anonymous feedback.

If you are scheduled to give a talk, first of all: thank you! Now, this is what to expect.

Read more about its genesis story, underlying philosophy and random trivia on its original site.

Past Events

Date Presenter Topic or Paper
2024.10.11 Rosanne Liu A few papers encountered at COLM 2024 [Slides]
2024.09.27 Abdul Fatir Ansari Chronos: Learning the Language of Time Series [Slides]
2024.09.20 Olivia Simin Fan DoGE: Domain Reweighting with Generalization Estimation [Slides]
2024.09.06 Vaidehi Patil Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks [Slides]
2024.08.30 Rosanne Liu Double Descent: what's that paper about again? [1] [2] [Slides]
2024.08.23 Robert Lange The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery [Slides] [Recording]
2024.08.16 Niloofar Mireshghallah Can LLMs Keep a Secret? Membership Inference Attacks and Contextual Integrity for Language [1] [2] [3] [4] [Slides] [Recording]
2024.08.09 Amanda Bertsch In-context learning with Long context models [Slides]
2024.08.02 Harshay Shah Decomposing and Editing Predictions by Modeling Model Computation [Slides]
2024.07.26 Gabriel Mukobi Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? [Slides]
2024.07.19 Zack Ankner Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models [Slides]
2024.07.12 Ziqiao Ma Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations [Slides]
2024.06.28 Gowthami Somepalli Measuring Style Similarity in Diffusion Models [Slides]
2024.06.21 Sanae Lotfi Non-Vacuous Generalization Bounds for Large Language Models [Slides]
2024.06.07 MARL Research Team, InstaDeep Growing the MARL software ecosystem in JAX [Slides] [Recording]
2024.05.31 Rishabh Agarwal Improving LLMs using self-generated data [1] [2] [3] [Slides] [Recording]
2024.05.24 Qiyao Wei Defining Expertise: Applications to Treatment Effect Estimation [Slides]
2024.05.17 Sofiane Ennadir, Yassine Abbahaddou Bounding the Expected Robustness of Graph Neural Networks Subject to Node Feature Attacks [Slides]
2024.05.03 Sunny Sanyal Pre-training with a little "less" Data and Compute [1] [2] [Slides] [Recording]
2024.04.26 Damien Teney Neural Redshift: Random Networks are not Random Functions [Slides]
2024.04.12 Diganta Misra Just Say the Name: Online Continual Learning with Category Names Only via Data Generation [Slides] [Recording]
2024.04.05 Richard Song OmniPred: Training Language Models as Universal Regressors [Slides]
2024.03.29 Hao Chen Understanding and Mitigating the Pre-training Noise on Downstream Tasks [Slides]
2024.03.22 Nathan Lambert RewardBench: Evaluating Reward Models for Language Modeling [Slides]
2024.03.15 Bram Grooten Efficient Focus for Autonomous Agents: on Generalization in Deep RL [1] [2] [Slides]
2024.03.08 Ildus Sadrtdinov To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning [Slides]
2024.03.01 Muhammad Khalifa GRACE: Discriminator-Guided Chain-of-Thought Reasoning [Slides]
2024.02.16 Hugh Chen Algorithms to estimate Shapley value feature attributions [Slides]
2024.02.09 Noam Razin Vanishing Gradients in Reinforcement Finetuning of Language Models [Slides]
2024.02.02 Olivia Wiles Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning [Slides]
2024.01.26 Elan Rosenfeld Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization [Slides] [Recording]
2024.01.19 Ayush Jain Beyond 2D Foundation Models: Towards building unified vision foundation models for 2D and 3D modalities [Slides]
2024.01.12 Chandan Singh Uniting Large Language Models and Decision Trees [1] [2] [Slides]
2024.01.05 Maksym Andriushchenko Why Do We Need Weight Decay in Modern Deep Learning? [Slides]
2023.12.08 Hattie Zhou What Algorithms can Transformers Learn? A Study in Length Generalization [Recording]
2023.12.01 Vidhisha Balachandran Understanding and Mitigating Factual Inconsistencies in Language Generation [1] [2] [Slides]
2023.11.24 Damien Teney Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup [Slides]
2023.11.17 Yaroslav Bulatov Engineer's Guide to Impractical Research
2023.11.10 Laurence Aitchison Deep kernel machines [1] [2] [Slides]
2023.11.03 Mingjie Sun A Simple and Effective Pruning Approach for Large Language Models [Slides]
2023.10.27 Chunyuan Li Multimodal Foundation Models: From Specialists to General-Purpose Assistants [Slides]
2023.10.20 Clare Lyle Understanding Plasticity in Neural Networks [Slides]
2023.10.13 Saurabh Dash Intriguing Properties of Quantization at Scale [Slides]
2023.10.06 Marta Garnelo Exploring the Space of Key-Value-Query Models with Intention
2023.09.29 Sean O'Brien Contrastive Decoding Improves Reasoning in Large Language Models [Slides]
2023.09.15 Evgenii Nikishin Deep Reinforcement Learning with Plasticity Injection [Slides]
2023.08.25 Nathan Lambert Reproducing Reinforcement Learning from Human Feedback [Slides]
2023.08.18 Rishabh Agarwal How to Distill Your Autoregressive Model [Slides]
2023.08.11 Potsawee Manakul SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models [Slides]
2023.08.04 Aryan Mehra ReViewNet + Planning-oriented Autonomous Driving [Slides]
2023.07.28 Amanda Bertsch Unlimiformer: Long-Range Transformers with Unlimited Length Input [Slides]
2023.07.21 Saurabh Garg Benchmarks and tasks for domain adaptation under distributional shift [Recording]
2023.07.07 Jing Yu Koh Generating Images with Multimodal Language Models [Slides]
2023.06.23 Tongzhou Wang Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [Slides]
2023.06.16 A. Feder Cooper Is My Prediction Arbitrary? Measuring Self-Consistency in Fair Classification [Slides]
2023.06.02 Austin Xu HandsOff: Labeled Dataset Generation With No Additional Human Annotations
2023.05.26 Tian Jin Pruning's Effect on Generalization Through the Lens of Training and Regularization
2023.05.19 Alexia Jolicoeur-Martineau PopulAtion Parameter Averaging (PAPA) [Slides]
2023.04.28 Robert Lange Discovering Evolution Strategies via Meta-Black-Box Optimization [Slides]
2023.04.21 Gargi Balasubramaniam Augmented Language Models: a Survey [Slides] [Recording]
2023.04.14 Sung Min (Sam) Park TRAK: Attributing Model Behavior at Scale [Slides]
2023.04.07 Muqeeth Soft Merging of Experts with Adaptive Routing [Slides]
2023.03.31 Gowthami Somepalli Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models [Slides]
2023.03.24 Rahim Entezari The Role of Pre-training Data in Transfer Learning [Slides] [Recording]
2023.03.17 Annie S. Chen, Yoonho Lee Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features [Slides]
2023.03.03 Keerthana Gopalakrishnan Towards a Robotics Foundation Model [1] [3] [4] [5] [Slides] [Recording]
2023.02.24 Rishi Bommasani Trustworthy Social Bias Measurement [Slides]
2023.02.17 Kartik Chandra, Audrey Xie Gradient Descent: The Ultimate Optimizer [Slides]
2023.02.10 Micah Goldblum Bridging the gap between deep learning theory and practice [Slides]
2023.02.03 Mengzhou Xia Training Trajectories of Language Models Across Scales [Slides]
2023.01.27 Bill Peebles Scalable Diffusion Models with Transformers [Slides]
2023.01.20 Hattie Zhou Teaching Algorithmic Reasoning via In-context Learning [Slides]
2023.01.13 Rosanne Liu Learning with Non-Data [Slides]
2022.12.16 Keller Jordan REPAIR: REnormalizing Permuted Activations for Interpolation Repair [Slides]
2022.12.09 Shoaib Ahmed Siddiqui Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics [Slides]
2022.11.18 Ben Poole DreamFusion: Text-to-3D using 2D Diffusion
2022.11.11 Adam Dziedzic Dataset Inference for Self-Supervised Models [Slides]
2022.11.04 Chris Rytting Out of One, Many: Using Language Models to Simulate Human Samples [Slides] [Recording]
2022.10.28 Rishabh Agarwal Beyond Tabula Rasa: Reincarnating Reinforcement Learning [Recording]
2022.10.21 Samuel Ainsworth Git Re-Basin: Merging Models modulo Permutation Symmetries [Slides] [Recording]
2022.10.14 Vivek Natarajan Self Supervised Learning for Medical Imaging [1] [2] [3] [Slides]
2022.10.07 Miguel Angel Bautista GAUDI: A Neural Architect for Immersive 3D Scene Generation
2022.09.30 Mathilde Caron Masked Siamese Networks for Label-Efficient Learning
2022.09.23 Tim Dettmers LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale [Slides]
2022.09.16 Sanae Lotfi Bayesian Model Selection, the Marginal Likelihood, and Generalization
2022.09.02 Yutian Chen Towards Learning Universal Hyperparameter Optimizers with Transformers [Slides] [Recording]
2022.08.26 VPT team, MineDojo team, Evocraft team Minecraft Team Jam [Recording]
2022.08.19 Vincent Sitzmann Machine Learning for Inverse Graphics
2022.08.12 Christina Baek Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift [Slides]
2022.08.05 Hossein Mobahi Sharpness-aware minimization (SAM): Current State and Future Directions
2022.07.29 Pamela Mishkin DALL-E 2 Series III: Safety and Mitigations Planning for DALL-E 2 [Recording]
2022.07.22 Felipe Such DALL-E 2 Series II: Deploying DALLE-2 safely at scale [Recording]
2022.07.15 Aditya Ramesh DALL-E 2 Series I: Manipulating Images with DALL-E 2 [Slides] [Recording]
2022.07.01 Pete Florence Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language [Slides]
2022.06.17 Greg Yang Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer [1] [2]
2022.06.10 Utku Evci Beyond Static Network Architectures [1] [2]
2022.05.20 Evgenii Nikishin The Primacy Bias in Deep Reinforcement Learning [Slides]
2022.05.06 Dan Zhang A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
2022.04.22 Germán Kruszewski Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs [Slides]
2022.04.15 Robert Lange On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning [Slides]
2022.04.08 Jan Drgona On the Stochastic Stability of Deep Markov Models [Slides]
2022.04.01 Hattie Zhou Fortuitous Forgetting in Connectionist Networks [Slides]
2022.03.18 Bryan Bischof Advancing mathematics by guiding human intuition with AI [Slides]
2022.03.11 Rahim Entezari The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks [Slides]
2022.03.04 Jacob Hilton WebGPT: Browser-assisted question-answering with human feedback [Slides]
2022.02.25 Saining Xie A ConvNet for the 2020s [Slides]
2022.02.18 Karsten Kreis Score-Based Generative Modeling with Critically-Damped Langevin Diffusion [Slides]
2022.02.11 Andrea Schioppa Paying less Attention for Language Modeling and Generation [1] [2] [3] [4] [Slides]
2022.02.04 Arash Vahdat Tackling the Generative Learning Trilemma with Accelerated Diffusion Models [1] [2] [Slides]
2022.01.21 Beidi Chen Simple and Efficient Sparse Training for Neural Network Models [Slides]
2022.01.14 Albert Gu Efficiently Modeling Long Sequences with Structured State Spaces [Slides]
2022.01.07 Charline Le Lan On the Generalization of Representations in Reinforcement Learning [Slides]
2021.12.17 Kyunghyun Cho Oversmoothing of <eos> in Neural Autoregressive Modeling [Slides]
2021.12.10 Becca Roelofs Is ImageNet Solved?: Evaluating Machine Accuracy [Slides]
2021.12.03 David Patterson How to Have a Bad Career [Slides]
2021.11.19 James Martens Deep Kernel Shaping [Slides]
2021.11.12 Gautam Kamath Differentially Private Fine-tuning of Language Models [Slides]
2021.11.05 Rishabh Agarwal How to Avoid Fooling Ourselves in Deep RL research [Slides]
2021.10.29 Mitchell Wortsman Robust fine-tuning of zero-shot models [Slides]
2021.10.15 Wuyang Chen Neural Architecture Search on ImageNet in Four GPU Hours [Slides]
2021.10.08 Adit Radha Simple, Fast, and Flexible Framework for Matrix Completion with Infinite Width Neural Networks [Slides]
2021.10.01 Ilan Price Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset [Slides]
2021.09.24 Preetum Nakkiran Distributional Generalization: A New Kind of Generalization [Slides]
2021.09.17 Stanislaw Jastrzebski Reverse-engineering implicit regularization due to large learning rates in deep learning [1] [2] [Slides]
2021.09.10 Emilien Dupont The curse of discretization and learning distributions of functions [1] [2] [Slides]
2021.08.27 Srishti Yadav Fair Attribute Classification through Latent Space De-biasing [Slides]
2021.08.20 Jonathan Ho Recent progress in diffusion models [1] [2] [3] [4] [5]
2021.08.13 Kale-ab Tessera Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization [Slides]
2021.07.30 Jaehoon Lee Dataset Meta-Learning from Kernel Ridge-Regression [1] [2] [Slides]
2021.07.16 Evgenii Nikishin Learning What Matters: Beyond Maximum Likelihood in Model-Based RL [Slides]
2021.07.09 Aviral Kumar Making Deep Reinforcement Learning Easier to Use: Alleviating Optimization and Tuning Challenges in Deep RL [1] [2] [3] [Slides]
2021.06.25 Josh Roy To Infinite (Visual) Transfer and Beyond [Slides]
2021.06.18 Ben Mildenhall Neural Volumetric Rendering: How NeRF Works [1] [2] [Slides]
2021.06.11 Preetum Nakkiran The Deep Bootstrap: Good Online Learners are Good Offline Generalizers [Slides]
2021.06.04 Sharon Zhou Evaluating the Disentanglement of Deep Generative Models through Manifold Topology [Slides]
2021.05.21 Richard Song Closing the Sim-To-Real Gap with Evolutionary Meta-Learning [Slides]
2021.05.14 Rosanne Liu A few papers I saw at ICLR 2021 [Slides]
2021.04.30 Angjoo Kanazawa Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image [Slides]
2021.04.23 Corentin Tallec Bootstrap your own latent: A new approach to self-supervised Learning [Slides]
2021.04.16 Pieter-Jan Hoedt, Frederik Kratzert MC-LSTM: Adding mass conservation to RNNs [Slides]
2021.04.09 Lilian Weng Asymmetric self-play for automatic goal discovery in robotic manipulation [Slides]
2021.04.02 Hady Elhasar, Muhammad Khalifa, Marc Dymetman A Distributional Approach to Controlled Text Generation [Slides]
2021.03.26 Xinlei Chen Exploring Simple Siamese Representation Learning and Beyond [Slides]
2021.03.19 Jay J. Thiagarajan Improving Reliability and Generalization of Deep Models via Prediction Calibration [1] [2] [3] [Slides]
2021.03.12 Rohan Anil Scalable Second Order Optimization for Deep Learning [Slides]
2021.03.05 Rishabh Agarwal Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [Slides]
2021.02.26 Robin Tibor Schirrmeister, Polina Kirichenko Understanding Semantic Anomaly Detection with Deep Convolutional Generative Networks [1] [2] [Slides]
2021.02.19 Krzysztof Choromanski Rethinking Attention with Performers - Towards New Transformers' Revolution [Slides]
2021.02.12 Yang Song Generative Modeling by Estimating Gradients of the Data Distribution [1] [2] [3] [Slides]
2021.02.05 Rosanne Liu Unconventional ways of training neural networks and what they teach us about model capacity [Slides]
2021.01.29 Johannes Brandstetter Hopfield Networks is All You Need [Slides]
2021.01.22 Andrey Malinin Uncertainty Estimation with Prior Networks [1] [2] [3] [4] [Slides]
2021.01.15 Jong Wook Kim Learning Transferable Visual Models From Natural Language Supervision [Slides]
2021.01.08 Liyuan Liu Understanding the Difficulty of Training Transformers [Slides]
2020.12.18 Julien Cornebise AI for Good and Ethics-Washing: a Self-Defense Primer [Slides]
2020.12.04 Rishabh Agarwal How I Learned To Stop Worrying And Love Offline RL [Slides]
2020.11.20 Sachit Menon PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models [Slides]
2020.11.13 Luke Metz Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves [Slides]
2020.11.06 Karl Cobbe Phasic Policy Gradient [Slides]
2020.10.30 Angelos Katharopoulos Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [Slides]
2020.10.23 Ryan Lowe Learning to Summarize with Human Feedback [Slides]
2020.10.16 Jason Lee Latent Variable Models and Iterative Refinement for Non-Autoregressive Neural Machine Translation [1] [2] [3] [Slides]
2020.10.09 Utku Evci Difficulty of Sparse Training and RigL [1] [2]
2020.10.02 Shrimai Prabhumoye Controllable Text Generation: Should machines reflect the way humans interact in society? [1] [2] [3] [Slides]
2020.09.25 Sidak Pal Singh Model Fusion via Optimal Transport [Slides]
2020.09.18 Katherine Ye Penrose: From Mathematical Notation to Beautiful Diagrams [Slides]
2020.09.11 Jesse Mu Compositional Explanations of Neurons [Slides]
2020.09.04 Yian Yin Unequal effects of the COVID-19 pandemic on scientists
2020.08.28 Arianna Ornaghi Gender Attitudes in the Judiciary: Evidence from U.S. Circuit Courts [Slides]
2020.08.21 Anna Goldie, Azalia Mirhoseini Chip Placement with Deep Reinforcement Learning [Paper]
2020.08.14 Zhongqi Miao Deep Learning and Realistic Datasets [1] [2] [Slides]
2020.07.31 Dan Hendrycks Out-of-distribution robustness in computer vision and NLP [1] [2] [Slides]
2020.07.24 Ben Mann Language Models are Few-Shot Learners [Slides]

...

More past events reside permanently on the original site.