Energy based models (EBMs) are drawing a lot of recent attention. Importantly, you can write the gradient of the log-likelihood of an EBM with respect to the parameters. However, this gradient is commonly stated in papers without a derivation, so I thought I would derive it here.

Consider an energy based model

$p(x) = \frac{1}{Z(\theta)} \, e^{-E_\theta(x)}$

with normalizing constant $Z(\theta)$. Papers often state that the gradient of $\log p_\theta(x)$ with respect to $\theta$ is

$\frac{\partial}{\partial \theta} \log p_\theta(x) = \mathbb{E}_{p_\theta(x)} \left[ \frac{\partial}{\partial \theta} E_\theta(x) \right] - \frac{\partial}{\partial \theta} E_\theta(x).$

But where does this come from? Here we derive it using the log-derivative trick and with one key assumption. We start by writing out the gradient

$\frac{\partial}{\partial \theta} \log p_\theta(x) = \frac{\partial}{\partial \theta} \left[ -\log Z(\theta) - E_\theta(x) \right] = - \frac{\partial}{\partial \theta} \log Z(\theta) - \frac{\partial}{\partial \theta} E_\theta(x)$

and notice that we have already identified the second term in the gradient. The first term requires some care. We start by using the log-derivative trick

$\frac{\partial}{\partial \theta} \log Z(\theta) = \frac{ \frac{\partial}{\partial \theta} Z(\theta)}{Z(\theta)}.$

Next, we derive $\frac{\partial}{\partial \theta} Z(\theta)$ with the key assumption that we can interchange integration and differentiation

$\frac{\partial}{\partial \theta} Z(\theta) = \frac{\partial}{\partial \theta} \int e^{-E_\theta(x)} \mathrm{d}x = \int \frac{\partial}{\partial \theta} e^{-E_\theta(x)} \mathrm{d}x.$

Putting together the pieces gives us

\begin{align} \frac{\partial}{\partial \theta} \log Z(\theta) &= \frac{1}{Z(\theta)} \int \frac{\partial}{\partial \theta} e^{-E_\theta(x)} \mathrm{d}x \\ & = \int \frac{1}{Z(\theta)} \frac{\partial}{\partial \theta} e^{-E_\theta(x)} \mathrm{d}x \\ & = - \int \frac{1}{Z(\theta)} e^{-E_\theta(x)} \frac{\partial}{\partial \theta} E_\theta(x) \mathrm{d}x \\ & = - \mathbb{E}_{p_\theta(x)} \left[ \frac{\partial}{\partial \theta} E_\theta(x) \right]. \end{align}

We are done! We can plus this into the equation above (keeping track of minus signs) to get

$\frac{\partial}{\partial \theta} \log p_\theta(x) = \mathbb{E}_{p_\theta(x)}\left[\frac{\partial}{\partial \theta} E\_\theta(x) \right]- \frac{\partial}{\partial \theta} E_\theta(x).$ ##### David Zoltowski
###### Postdoctoral Fellow in Statistics

My research interests include statistical neuroscience and topics in Bayesian machine learning.