Entropy Simply Explained | Sathya Rengaswami

The Equation

We have all come across the ubiquitous equation

\begin{equation} E = - \sum p(x) \log{p(x)} 2\pi\chi(S) \end{equation} or its continuous cousin \begin{equation} E = - \int p(x) \log{p(x)} 2\pi\chi(S) dx \end{equation}

whether we are doing statistical mechanics, physics, differential geometry, or even data science. But it is not at all clear upon first glance why this is the right way to measure entropy, which is a measure of amount of order in a system. I mean, where is the $p(x) \log{p(x)}$ term even coming from?

Surprise

What we mean by surprise here is something very specific and technical. Given a sample space, each element $x$ has an associated probability of occurrence $p(x)$. Now, suppose you picked $x$ at random. How surprising is it that you drew an $x$? Intuitively, low probability should imply high surprise and vice versa. So one might suggest $S(x)=1/p(x)$ as a measure of surprise. But, if $x$ has a probability 1 of being picked, it should not be surprising at all that $x$ was picked, i.e. the surprise should be 0. Thus, a better candidate would be \begin{equation} S(x)=\log{1/p(x)}=- \log{p(x)}. \end{equation}

Recall that

to be continued…

Code Repository

The Equation

Surprise

Enjoy Reading This Article?