2024.10.11 |
Rosanne Liu |
A few papers encountered at COLM 2024
[Slides]
|
2024.09.27 |
Abdul Fatir Ansari |
Chronos: Learning the Language of Time Series
[Slides]
|
2024.09.20 |
Olivia Simin Fan |
DoGE: Domain Reweighting with Generalization Estimation
[Slides]
|
2024.09.06 |
Vaidehi Patil |
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
[Slides]
|
2024.08.30 |
Rosanne Liu |
Double Descent: what's that paper about again? [1] [2]
[Slides]
|
2024.08.23 |
Robert Lange |
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
[Slides]
[Recording]
|
2024.08.16 |
Niloofar Mireshghallah |
Can LLMs Keep a Secret? Membership Inference Attacks and Contextual Integrity for Language [1] [2] [3] [4]
[Slides]
[Recording]
|
2024.08.09 |
Amanda Bertsch |
In-context learning with Long context models
[Slides]
|
2024.08.02 |
Harshay Shah |
Decomposing and Editing Predictions by Modeling Model Computation
[Slides]
|
2024.07.26 |
Gabriel Mukobi |
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
[Slides]
|
2024.07.19 |
Zack Ankner |
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
[Slides]
|
2024.07.12 |
Ziqiao Ma |
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
[Slides]
|
2024.06.28 |
Gowthami Somepalli |
Measuring Style Similarity in Diffusion Models
[Slides]
|
2024.06.21 |
Sanae Lotfi |
Non-Vacuous Generalization Bounds for Large Language Models
[Slides]
|
2024.06.07 |
MARL Research Team, InstaDeep |
Growing the MARL software ecosystem in JAX
[Slides]
[Recording]
|
2024.05.31 |
Rishabh Agarwal |
Improving LLMs using self-generated data [1] [2] [3]
[Slides]
[Recording]
|
2024.05.24 |
Qiyao Wei |
Defining Expertise: Applications to Treatment Effect Estimation
[Slides]
|
2024.05.17 |
Sofiane Ennadir, Yassine Abbahaddou |
Bounding the Expected Robustness of Graph Neural Networks Subject to Node Feature Attacks
[Slides]
|
2024.05.03 |
Sunny Sanyal |
Pre-training with a little "less" Data and Compute [1] [2]
[Slides]
[Recording]
|
2024.04.26 |
Damien Teney |
Neural Redshift: Random Networks are not Random Functions
[Slides]
|
2024.04.12 |
Diganta Misra |
Just Say the Name: Online Continual Learning with Category Names Only via Data Generation
[Slides]
[Recording]
|
2024.04.05 |
Richard Song |
OmniPred: Training Language Models as Universal Regressors
[Slides]
|
2024.03.29 |
Hao Chen |
Understanding and Mitigating the Pre-training Noise on Downstream Tasks
[Slides]
|
2024.03.22 |
Nathan Lambert |
RewardBench: Evaluating Reward Models for Language Modeling
[Slides]
|
2024.03.15 |
Bram Grooten |
Efficient Focus for Autonomous Agents: on Generalization in Deep RL [1] [2]
[Slides]
|
2024.03.08 |
Ildus Sadrtdinov |
To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning
[Slides]
|
2024.03.01 |
Muhammad Khalifa |
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
[Slides]
|
2024.02.16 |
Hugh Chen |
Algorithms to estimate Shapley value feature attributions
[Slides]
|
2024.02.09 |
Noam Razin |
Vanishing Gradients in Reinforcement Finetuning of Language Models
[Slides]
|
2024.02.02 |
Olivia Wiles |
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning
[Slides]
|
2024.01.26 |
Elan Rosenfeld |
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
[Slides]
[Recording]
|
2024.01.19 |
Ayush Jain |
Beyond 2D Foundation Models: Towards building unified vision foundation models for 2D and 3D modalities
[Slides]
|
2024.01.12 |
Chandan Singh |
Uniting Large Language Models and Decision Trees [1] [2]
[Slides]
|
2024.01.05 |
Maksym Andriushchenko |
Why Do We Need Weight Decay in Modern Deep Learning?
[Slides]
|
2023.12.08 |
Hattie Zhou |
What Algorithms can Transformers Learn? A Study in Length Generalization
[Recording]
|
2023.12.01 |
Vidhisha Balachandran |
Understanding and Mitigating Factual Inconsistencies in Language Generation [1] [2]
[Slides]
|
2023.11.24 |
Damien Teney |
Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup
[Slides]
|
2023.11.17 |
Yaroslav Bulatov |
Engineer's Guide to Impractical Research
|
2023.11.10 |
Laurence Aitchison |
Deep kernel machines [1] [2]
[Slides]
|
2023.11.03 |
Mingjie Sun |
A Simple and Effective Pruning Approach for Large Language Models
[Slides]
|
2023.10.27 |
Chunyuan Li |
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
[Slides]
|
2023.10.20 |
Clare Lyle |
Understanding Plasticity in Neural Networks
[Slides]
|
2023.10.13 |
Saurabh Dash |
Intriguing Properties of Quantization at Scale
[Slides]
|
2023.10.06 |
Marta Garnelo |
Exploring the Space of Key-Value-Query Models with Intention
|
2023.09.29 |
Sean O'Brien |
Contrastive Decoding Improves Reasoning in Large Language Models
[Slides]
|
2023.09.15 |
Evgenii Nikishin |
Deep Reinforcement Learning with Plasticity Injection
[Slides]
|
2023.08.25 |
Nathan Lambert |
Reproducing Reinforcement Learning from Human Feedback
[Slides]
|
2023.08.18 |
Rishabh Agarwal |
How to Distill Your Autoregressive Model
[Slides]
|
2023.08.11 |
Potsawee Manakul |
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
[Slides]
|
2023.08.04 |
Aryan Mehra |
ReViewNet + Planning-oriented Autonomous Driving
[Slides]
|
2023.07.28 |
Amanda Bertsch |
Unlimiformer: Long-Range Transformers with Unlimited Length Input
[Slides]
|
2023.07.21 |
Saurabh Garg |
Benchmarks and tasks for domain adaptation under distributional shift
[Recording]
|
2023.07.07 |
Jing Yu Koh |
Generating Images with Multimodal Language Models
[Slides]
|
2023.06.23 |
Tongzhou Wang |
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
[Slides]
|
2023.06.16 |
A. Feder Cooper |
Is My Prediction Arbitrary? Measuring Self-Consistency in Fair Classification
[Slides]
|
2023.06.02 |
Austin Xu |
HandsOff: Labeled Dataset Generation With No Additional Human Annotations
|
2023.05.26 |
Tian Jin |
Pruning's Effect on Generalization Through the Lens of Training and Regularization
|
2023.05.19 |
Alexia Jolicoeur-Martineau |
PopulAtion Parameter Averaging (PAPA)
[Slides]
|
2023.04.28 |
Robert Lange |
Discovering Evolution Strategies via Meta-Black-Box Optimization
[Slides]
|
2023.04.21 |
Gargi Balasubramaniam |
Augmented Language Models: a Survey
[Slides]
[Recording]
|
2023.04.14 |
Sung Min (Sam) Park |
TRAK: Attributing Model Behavior at Scale
[Slides]
|
2023.04.07 |
Muqeeth |
Soft Merging of Experts with Adaptive Routing
[Slides]
|
2023.03.31 |
Gowthami Somepalli |
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
[Slides]
|
2023.03.24 |
Rahim Entezari |
The Role of Pre-training Data in Transfer Learning
[Slides]
[Recording]
|
2023.03.17 |
Annie S. Chen, Yoonho Lee |
Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features
[Slides]
|
2023.03.03 |
Keerthana Gopalakrishnan |
Towards a Robotics Foundation Model [1] [3] [4] [5]
[Slides]
[Recording]
|
2023.02.24 |
Rishi Bommasani |
Trustworthy Social Bias Measurement
[Slides]
|
2023.02.17 |
Kartik Chandra, Audrey Xie |
Gradient Descent: The Ultimate Optimizer
[Slides]
|
2023.02.10 |
Micah Goldblum |
Bridging the gap between deep learning theory and practice
[Slides]
|
2023.02.03 |
Mengzhou Xia |
Training Trajectories of Language Models Across Scales
[Slides]
|
2023.01.27 |
Bill Peebles |
Scalable Diffusion Models with Transformers
[Slides]
|
2023.01.20 |
Hattie Zhou |
Teaching Algorithmic Reasoning via In-context Learning
[Slides]
|
2023.01.13 |
Rosanne Liu |
Learning with Non-Data
[Slides]
|
2022.12.16 |
Keller Jordan |
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
[Slides]
|
2022.12.09 |
Shoaib Ahmed Siddiqui |
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
[Slides]
|
2022.11.18 |
Ben Poole |
DreamFusion: Text-to-3D using 2D Diffusion
|
2022.11.11 |
Adam Dziedzic |
Dataset Inference for Self-Supervised Models
[Slides]
|
2022.11.04 |
Chris Rytting |
Out of One, Many: Using Language Models to Simulate Human Samples
[Slides]
[Recording]
|
2022.10.28 |
Rishabh Agarwal |
Beyond Tabula Rasa: Reincarnating Reinforcement Learning
[Recording]
|
2022.10.21 |
Samuel Ainsworth |
Git Re-Basin: Merging Models modulo Permutation Symmetries
[Slides]
[Recording]
|
2022.10.14 |
Vivek Natarajan |
Self Supervised Learning for Medical Imaging [1] [2] [3]
[Slides]
|
2022.10.07 |
Miguel Angel Bautista |
GAUDI: A Neural Architect for Immersive 3D Scene Generation
|
2022.09.30 |
Mathilde Caron |
Masked Siamese Networks for Label-Efficient Learning
|
2022.09.23 |
Tim Dettmers |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
[Slides]
|
2022.09.16 |
Sanae Lotfi |
Bayesian Model Selection, the Marginal Likelihood, and Generalization
|
2022.09.02 |
Yutian Chen |
Towards Learning Universal Hyperparameter Optimizers with Transformers
[Slides]
[Recording]
|
2022.08.26 |
VPT team, MineDojo team, Evocraft team |
Minecraft Team Jam
[Recording]
|
2022.08.19 |
Vincent Sitzmann |
Machine Learning for Inverse Graphics
|
2022.08.12 |
Christina Baek |
Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift
[Slides]
|
2022.08.05 |
Hossein Mobahi |
Sharpness-aware minimization (SAM): Current State and Future Directions
|
2022.07.29 |
Pamela Mishkin |
DALL-E 2 Series III: Safety and Mitigations Planning for DALL-E 2
[Recording]
|
2022.07.22 |
Felipe Such |
DALL-E 2 Series II: Deploying DALLE-2 safely at scale
[Recording]
|
2022.07.15 |
Aditya Ramesh |
DALL-E 2 Series I: Manipulating Images with DALL-E 2
[Slides]
[Recording]
|
2022.07.01 |
Pete Florence |
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
[Slides]
|
2022.06.17 |
Greg Yang |
Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer [1] [2]
|
2022.06.10 |
Utku Evci |
Beyond Static Network Architectures [1] [2]
|
2022.05.20 |
Evgenii Nikishin |
The Primacy Bias in Deep Reinforcement Learning
[Slides]
|
2022.05.06 |
Dan Zhang |
A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
|
2022.04.22 |
Germán Kruszewski |
Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs
[Slides]
|
2022.04.15 |
Robert Lange |
On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
[Slides]
|
2022.04.08 |
Jan Drgona |
On the Stochastic Stability of Deep Markov Models
[Slides]
|
2022.04.01 |
Hattie Zhou |
Fortuitous Forgetting in Connectionist Networks
[Slides]
|
2022.03.18 |
Bryan Bischof |
Advancing mathematics by guiding human intuition with AI
[Slides]
|
2022.03.11 |
Rahim Entezari |
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
[Slides]
|
2022.03.04 |
Jacob Hilton |
WebGPT: Browser-assisted question-answering with human feedback
[Slides]
|
2022.02.25 |
Saining Xie |
A ConvNet for the 2020s
[Slides]
|
2022.02.18 |
Karsten Kreis |
Score-Based Generative Modeling with Critically-Damped Langevin Diffusion
[Slides]
|
2022.02.11 |
Andrea Schioppa |
Paying less Attention for Language Modeling and Generation [1] [2] [3] [4]
[Slides]
|
2022.02.04 |
Arash Vahdat |
Tackling the Generative Learning Trilemma with Accelerated Diffusion Models [1] [2]
[Slides]
|
2022.01.21 |
Beidi Chen |
Simple and Efficient Sparse Training for Neural Network Models
[Slides]
|
2022.01.14 |
Albert Gu |
Efficiently Modeling Long Sequences with Structured State Spaces
[Slides]
|
2022.01.07 |
Charline Le Lan |
On the Generalization of Representations in Reinforcement Learning
[Slides]
|
2021.12.17 |
Kyunghyun Cho |
Oversmoothing of <eos> in Neural Autoregressive Modeling
[Slides]
|
2021.12.10 |
Becca Roelofs |
Is ImageNet Solved?: Evaluating Machine Accuracy
[Slides]
|
2021.12.03 |
David Patterson |
How to Have a Bad Career
[Slides]
|
2021.11.19 |
James Martens |
Deep Kernel Shaping
[Slides]
|
2021.11.12 |
Gautam Kamath |
Differentially Private Fine-tuning of Language Models
[Slides]
|
2021.11.05 |
Rishabh Agarwal |
How to Avoid Fooling Ourselves in Deep RL research
[Slides]
|
2021.10.29 |
Mitchell Wortsman |
Robust fine-tuning of zero-shot models
[Slides]
|
2021.10.15 |
Wuyang Chen |
Neural Architecture Search on ImageNet in Four GPU Hours
[Slides]
|
2021.10.08 |
Adit Radha |
Simple, Fast, and Flexible Framework for Matrix Completion with Infinite Width Neural Networks
[Slides]
|
2021.10.01 |
Ilan Price |
Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset
[Slides]
|
2021.09.24 |
Preetum Nakkiran |
Distributional Generalization: A New Kind of Generalization
[Slides]
|
2021.09.17 |
Stanislaw Jastrzebski |
Reverse-engineering implicit regularization due to large learning rates in deep learning [1] [2]
[Slides]
|
2021.09.10 |
Emilien Dupont |
The curse of discretization and learning distributions of functions [1] [2]
[Slides]
|
2021.08.27 |
Srishti Yadav |
Fair Attribute Classification through Latent Space De-biasing
[Slides]
|
2021.08.20 |
Jonathan Ho |
Recent progress in diffusion models [1] [2] [3] [4] [5]
|
2021.08.13 |
Kale-ab Tessera |
Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization
[Slides]
|
2021.07.30 |
Jaehoon Lee |
Dataset Meta-Learning from Kernel Ridge-Regression [1] [2]
[Slides]
|
2021.07.16 |
Evgenii Nikishin |
Learning What Matters: Beyond Maximum Likelihood in Model-Based RL
[Slides]
|
2021.07.09 |
Aviral Kumar |
Making Deep Reinforcement Learning Easier to Use: Alleviating Optimization and Tuning Challenges in Deep RL [1] [2] [3]
[Slides]
|
2021.06.25 |
Josh Roy |
To Infinite (Visual) Transfer and Beyond
[Slides]
|
2021.06.18 |
Ben Mildenhall |
Neural Volumetric Rendering: How NeRF Works [1] [2]
[Slides]
|
2021.06.11 |
Preetum Nakkiran |
The Deep Bootstrap: Good Online Learners are Good Offline Generalizers
[Slides]
|
2021.06.04 |
Sharon Zhou |
Evaluating the Disentanglement of Deep Generative Models through Manifold Topology
[Slides]
|
2021.05.21 |
Richard Song |
Closing the Sim-To-Real Gap with Evolutionary Meta-Learning
[Slides]
|
2021.05.14 |
Rosanne Liu |
A few papers I saw at ICLR 2021
[Slides]
|
2021.04.30 |
Angjoo Kanazawa |
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
[Slides]
|
2021.04.23 |
Corentin Tallec |
Bootstrap your own latent: A new approach to self-supervised Learning
[Slides]
|
2021.04.16 |
Pieter-Jan Hoedt, Frederik Kratzert |
MC-LSTM: Adding mass conservation to RNNs
[Slides]
|
2021.04.09 |
Lilian Weng |
Asymmetric self-play for automatic goal discovery in robotic manipulation
[Slides]
|
2021.04.02 |
Hady Elhasar, Muhammad Khalifa, Marc Dymetman |
A Distributional Approach to Controlled Text Generation
[Slides]
|
2021.03.26 |
Xinlei Chen |
Exploring Simple Siamese Representation Learning and Beyond
[Slides]
|
2021.03.19 |
Jay J. Thiagarajan |
Improving Reliability and Generalization of Deep Models via Prediction Calibration [1] [2] [3]
[Slides]
|
2021.03.12 |
Rohan Anil |
Scalable Second Order Optimization for Deep Learning
[Slides]
|
2021.03.05 |
Rishabh Agarwal |
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
[Slides]
|
2021.02.26 |
Robin Tibor Schirrmeister, Polina Kirichenko |
Understanding Semantic Anomaly Detection with Deep Convolutional Generative Networks [1] [2]
[Slides]
|
2021.02.19 |
Krzysztof Choromanski |
Rethinking Attention with Performers - Towards New Transformers' Revolution
[Slides]
|
2021.02.12 |
Yang Song |
Generative Modeling by Estimating Gradients of the Data Distribution [1] [2] [3]
[Slides]
|
2021.02.05 |
Rosanne Liu |
Unconventional ways of training neural networks and what they teach us about model capacity
[Slides]
|
2021.01.29 |
Johannes Brandstetter |
Hopfield Networks is All You Need
[Slides]
|
2021.01.22 |
Andrey Malinin |
Uncertainty Estimation with Prior Networks [1] [2] [3] [4]
[Slides]
|
2021.01.15 |
Jong Wook Kim |
Learning Transferable Visual Models From Natural Language Supervision
[Slides]
|
2021.01.08 |
Liyuan Liu |
Understanding the Difficulty of Training Transformers
[Slides]
|
2020.12.18 |
Julien Cornebise |
AI for Good and Ethics-Washing: a Self-Defense Primer
[Slides]
|
2020.12.04 |
Rishabh Agarwal |
How I Learned To Stop Worrying And Love Offline RL
[Slides]
|
2020.11.20 |
Sachit Menon |
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
[Slides]
|
2020.11.13 |
Luke Metz |
Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves
[Slides]
|
2020.11.06 |
Karl Cobbe |
Phasic Policy Gradient
[Slides]
|
2020.10.30 |
Angelos Katharopoulos |
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
[Slides]
|
2020.10.23 |
Ryan Lowe |
Learning to Summarize with Human Feedback
[Slides]
|
2020.10.16 |
Jason Lee |
Latent Variable Models and Iterative Refinement for Non-Autoregressive Neural Machine Translation [1] [2] [3]
[Slides]
|
2020.10.09 |
Utku Evci |
Difficulty of Sparse Training and RigL [1] [2]
|
2020.10.02 |
Shrimai Prabhumoye |
Controllable Text Generation: Should machines reflect the way humans interact in society? [1] [2] [3]
[Slides]
|
2020.09.25 |
Sidak Pal Singh |
Model Fusion via Optimal Transport
[Slides]
|
2020.09.18 |
Katherine Ye |
Penrose: From Mathematical Notation to Beautiful Diagrams
[Slides]
|
2020.09.11 |
Jesse Mu |
Compositional Explanations of Neurons
[Slides]
|
2020.09.04 |
Yian Yin |
Unequal effects of the COVID-19 pandemic on scientists
|
2020.08.28 |
Arianna Ornaghi |
Gender Attitudes in the Judiciary: Evidence from U.S. Circuit Courts
[Slides]
|
2020.08.21 |
Anna Goldie, Azalia Mirhoseini |
Chip Placement with Deep Reinforcement Learning [Paper]
|
2020.08.14 |
Zhongqi Miao |
Deep Learning and Realistic Datasets [1] [2]
[Slides]
|
2020.07.31 |
Dan Hendrycks |
Out-of-distribution robustness in computer vision and NLP [1] [2]
[Slides]
|
2020.07.24 |
Ben Mann |
Language Models are Few-Shot Learners
[Slides]
|