Should you care about GFlowNets? What are they anyway? Learn about how GFlowNets are aiding drug discovery and reasoning in large language models!
**Like, subscribe, and share if you find this video valuable!**
Tutorial: https://milayb.notion.site/The-GFlowNet-Tutorial-95434ef0e2d94c24aab90e69b30be9b3
0:00 - Why care about GFlowNets?
0:54 - The problems GFlowNets solve
1:39 - A concrete example: drug discovery
3:53 - What GFlowNet really is
4:46 - Applications: GFlowNet-EM
5:58 - Applications: Better LLM reasoning
6:55 - Conclusion
Papers mentioned:
- GFlowNet for drug discovery (first GFlowNet paper)
https://arxiv.org/abs/2106.04399
- Jointly training a GFlowNet and an energy-based model
https://arxiv.org/abs/2202.01361
- GFlowNet-EM
https://arxiv.org/abs/2302.06576
- GFlowNet for better reasoning in LLMs
https://arxiv.org/pdf/2310.04363.pdf
Follow me on Twitter:
https://twitter.com/edwardjhu
🙏 This video would not be possible without my wonderful labmates at Mila and, of course, Yoshua.
Why should you care about GFlowNets? Is it the next Transformer? Is it Yoshua bengio's pet project? Or is it one of those ideas that are so 2010s when all the cool kids are training
large language models in the 2020s? Today, I'll talk about why you should care
about GFlowNets and why it is the future. My name is Edward Hu. I'm not exactly
impartial here because Yoshua Bengio is my PhD advisor, and I directly worked
on GFlowNet, but I also love what works! I invented low-rank adaptation, or
LoRA, when I was a researcher at Microsoft working with GPT-3 -- and
by the way here's a video on LoRA. Today, I'm a research scientist at
OpenAI. So, I want to talk about what makes GFlowNets so exciting, and how
it's going to shape the future of AI. First of all, GFlowNet sounds like a type of
neural network like a Transformer or a ResNet. However, it is not. GFlowNet stands for generative
flow networks and is a learning algorithm. And before I tell you more
about the algorithm itself, I
'm going to start with the problem it solves. So, if you ask an AI practitioner
what their worst nightmare is, most people will tell you it's either
over-fitting or hyper-parameter tuning. And by the way, I have another video
on muTransfer which is a technique that allows you to tune hyper-parameters
for a large model much more easily, which I'll link here, but today we're
going to talk about overfitting. Overfitting usually happens when we
ask the model to maximize something. For example,
we might be maximizing the likelihood
of a dataset; the best way -- the perfect way -- to maximize the likelihood of a dataset is to
memorize it completely without "understanding" it. And if you just memorize the
answers to certain questions, it's not going to help you to generalize
to the questions you've never seen before. And by the way, something similar happens
in reinforcement learning as well, where if you maximize the reward,
very often what you find is hacks. I'm going to give yo
u a really
concrete scenario where this shows up. Say you're inventing a new drug
molecule through trials and errors. What you're doing is basically reinforcement
learning albeit with a really expensive reward function, because the real reward function
is: you take this drug and run a clinical trial and you get a reward in the end,
but it's extremely slow and expensive. What people do in practice is
they collect some clinical data and then they train a neural network
to simulate the real
reward function. Now, you can imagine what happens
if you maximize a reward under this proxy model that is far from perfect. You're going to find a molecule that obtains
an extremely high reward under this model, but the chance of that molecule being actually
the drug you want is low, because there's so many ways to trick a neural network into
thinking a molecule has a high reward. However, this reward function is still
capturing some information about drug worthiness, even though we don't
trust it 100%. Practically speaking, we don't just want the single best molecule under this
reward -- we want many good ones. Even better, these good molecules should be
as different as possible from one another. What we do then is we try
them all in the real world, and hopefully some of them are actually good. If we take just the best one,
it is almost guaranteed that this one molecule is exploiting some
imperfections in our reward model. In this example here, I'm highlighting
the importa
nce of having diversity as opposed to finding just the max like in maximum
likelihood estimation or reward maximization. In fact, here's an idea: say we have
a reward function for molecules. What If instead of just getting a single
molecule, we have a generator of molecules, and here's what a generator does: it generates a
molecule with a probability that is proportional to the reward, meaning that if a molecule has a
really high reward, then it's more likely that it's going to be generated
, and if they have
a bunch of molecules that are equally highly likely under the reward function, then
they are equally likely to be generated. Now, as I generate molecules using this generator,
we're going to get candidates, and most of them are going to have high rewards because low-reward
ones have low probabilities getting generated. Imagine an objective function that allows
you to train a generator like that. The objective function and the
algorithm that allows you to do that could be
generative flow network or GFlowNet, and this drug discovery example is actually the
motivating use case in the first GFlowNet paper. However, GFlowNet is much
more than just drug discovery. The high level takeway is that
GFlowNet is a novel training algorithm. Instead of looking at a dataset or
a reward function and ask "how can I find the function that maximizes this?" GFlowNet
asks "okay, how can I find a sampler -- a neural network sampler -- that allows me to sample
proportional to a
given reward function?" If we're given a dataset
instead of a reward function, there's another paper from our lab
which says "okay, given a dataset, I'm going to first learn an energy-based
model, and then I'll train a sampler." The sampler will sample proportional
to my energy-based model. So, GFlowNet is really
shifting the question we ask: instead of maximizing something,
we're matching a distribution. And finally, I'm going to give you a quick example
of how this can be useful in the
real world, and, actually I'm going to give you two
examples -- two papers that I led. And I'm happy to dive into these
two papers in future videos, but today I'm just going to give you a
quick taste of what it could look like. The first paper is called GFlowNet-EM, and we're tackling a problem
fundamental to machine learning. And the second one is going to be more empirical:
it has something to do with large language models. Many of us have heard of the
expectation-maximization algorithm
, which is used to find maximum likelihood
estimates in latent-variable models. So here, the big idea is that in the expectation
step, we want to sample from a posterior distribution over latent variables and what
happens is that this posterior distribution for non-trivial models is usually intractable, so
people do things like Markov Chain Monte Carlo, where they make simplifying assumptions
so the posterior becomes easy to model. So now, I have this intractable
posterior distribution whi
ch can be described by a reward function. Reinforcement learning can help us
find the maximum of this posterior, but we want to match the distribution. We want to draw samples from the distribution
for learning, which is a hard inference problem, and here GFlowNet converts this hard inference
problem, usually solved with simulation, into something we can use a neural network to
solve, and we love training big neural network these days, because we're good at it, which
makes GFlowNet a bridge
between classical problems in machine learning and scaling
neural networks, which is the future of AI. In the second example, we
have a large language model, and we want to use it to solve, say,
a certain kind of reasoning task. However, we don't have a lot of data points. What
we have is, say, maybe 10, 20, or 50 data points. What we going to do, instead of doing
fine-tuning, which easily leads to overfitting, we're actually going to search the posterior
or potential reasoning chains und
er this model. Usually, people either find the
most likely reasoning chain using reinforcement learning or
maybe few-shot prompting, basically hoping the model will come up with a
good reasoning chain if we ask it really nicely. Here, we're using GFlowNet to actually
train the model to directly sample good reasoning chains that could have led
to the correct answer under the model proportional to how likely the reasoning
chain can lead to the correct answer. So the result here is that we're
able
to boost data efficiency in many cases. This paper will actually be an oral
presentation at ICLR this year, so maybe I'll see many of you in person. So long story short, GFlowNet is not
a new neural network architecture. It's a new learning algorithm that allows you
to train a sampler that samples proportional to a reward function, and it has many
applications going forward, especially as we focus on improving the generalization
and data efficiency of our neural networks. On the theo
retical side, it has connections
to maximum-entropy reinforcement learning with path consistency objectives, which I'm
happy to dive deeper into in a future video. If you find this video helpful,
please like, subscribe, and share it with somebody else who might be
interested. I'll see you in the next video!
Comments
I clicked the video thinking.... 😑 Oh look another "AI Expert". I invented LoRA.. Ok sir you have my attention. Hard to find real experts out here. Glad I clicked 😀.
Finally a real AI expert with a great track record, inventor and researcher, author of multiple research papers on AI. Great having you brother.
"Huh, what's this guy's deal? Well, GFlowNets sound cool, let's check him out" "I invented LoRA, Prof. Bengio was my PhD Advisor and I'm currently an OpenAI Research Scientist" "Ah, so 'the real' is this guy's deal...gotcha". Easy subscribe.
Great content! Might want to get a mic for your next one, it'll make all the difference
Thank you for taking the time to produce great content!
Fantastic video! No hype just the intuition behind current research. I love this type of content because it sets the foundational thinking required to make sense of research papers (which are normally opaque to someone who isn't deep in the field of study)
Such a great video, massively underrated channel! Liked & subbed, keep it up
Bill Mollison would be proud - Optimisation rather than maximisation is the best way to go.
That was excellent. Thank you for taking the time to share things like this.
Oh wow. Awesome dude! Thanks for your work with LoRA and crating this video! Subbed. You sir are awesome!
Love the style of video. Hope you can explain some of these more advances things in the future, it's really interesting.
super interesting and clear presentation as always. Thank you so much for making this content available!
Thank you for making it open to everyone. ❤❤❤ thank you from Venezuela!
People like you made YouTube great! Thank you for contributing
Nice delivery. Thanks for bringing this to us.
Awesome video, it really sparks my interest in neural nets. Thanks for the broad overview of what works and what doesn't when training a model.
Awesome!!I am about to start my PhD in cs and stuff like this is way motivating and inspiring…keeping sharing your research through videos…I literally read the Lora paper…then watched your video then went back and read the paper again and then came back and watched the video again
this is really informative. thanks for taking time to make such videos.
Brilliant topic and explanation! Looking forward for your deep dives and if you like some observations or insights you have made on your saga as a researcher 😊
Exciting. Waiting for more videos✌️