In early 2014 Tony DeRose (Senior Scientist and Lead of the Research Group at Pixar Animation Studios ) and Elyse Klaidman (Director of Pixar University and Archives) approached Khan Academy with an idea. They wanted to answer a question everyone asks in school at some point: “Why do I need to learn this?” Previously, Tony had given talks which try and engage children in mathematics by demonstrating how math lives at the intersection of design and technology at Pixar. It was clear that you could motivate kids to learn math and science by showing them how concepts they encounter in school are used at Pixar to make movie magic…Continue reading →

It wasn’t until 1920 that the question “how do we quantify information” was well articulated. This video introduces a key idea of Nyquist and Hartley, who laid the groundwork for Claude Shannon’s historic equation (Information Entropy) two decades later. In these early papers, the idea of using a logarithmic function appears, something which isn’t immediately obvious to most students fresh to this subject. If one ‘takes this for granted’ they will forever miss the deeper insights which come later. So, the goal of this video is to provide intuition behind why the logarithm was the ‘natural’ choice…

The following video/simulation was an attempt to bridge the gap between information as what we mean to say vs. information as what we could say. I view this as an important stepping stone towards Hartly,Nyquist and Shannon – which I will deal with next. It covers symbols, symbol rate (baud) and message space as an introduction to channel capacity. Featuring the Baudot multiplex system and Thomas Edison’s quadruplex telegraph.

Play with simulations used in video on Khan Academy:

The follow three video mini-series is a bit of an Engineering detour in the story of information theory. In order to easily grasp the ideas of Hartley and Shannon, I felt it would be beneficial to lay some groundwork. It began with my own selfish interest in wanting to relive some famous experiments & technologies from the 19th Century. Specifically, why did the Information Age arise? When and how did electricity play a role in communication? Why was magnetism involved? Why did Morse code become so popular compared to the European designs? How was information understood before words (and concepts) such as “bit” existed? What’s the difference between static electricity and current?

All of these questions are answered as we slowly uncover a more modern approach to sending differences over a distance…

It’s powerful to understand how conditional probability can be visualized using decision trees. I wanted to create an alternative to most explanations which often start with many abstractions. I was drawn to the idea of looking at the back pages of a choose-your-own-adventure book, and deciding how you could have arrived there. Here I present a visual method using a story involving coins… allowing you to decide how to formalize. Once we grow tired of growing trees, we may ask the key questions: how can we speed up this process?:

This is followed by a game I designed (built by Peter Collingridge) which introduces how branches can be weighted instead of counted.

In order to understand the subtle conceptual shifts leading to the insights behind Information Theory, I felt a historical foundation was needed. First I decided to present the viewer with a practical problem which future mathematical concepts will be applied to. Ideally this will allow the viewer to independently develop key intuitions, and most importantly, begin asking the right kind of questions:

I noticed the viewer ideas for how to compress information (reduce plucks) fell into two general camps. The first are ways of using differentials in time to reduce the number of plucks. The second are ways of making different kind of plucks to increase the expressive capability of a single pluck. Also, hiding in the background is the problem of what to do about character spaces. Next I thought it would be beneficial to pause and follow a historical narrative (case study) exploring this problem. My goal here is two congratulate the viewer for independently realizing a previously ‘revolutionary’ idea, and at the same time, reinforcing some conceptual mechanics we will need later. It was also important to connect this video to previous lessons on the origins of our alphabet (a key technology in our story), providing a bridge from proto-aphabets we previously explored….

This is followed by a simulation which nails down the point that each state is really a decision path

Before jumping into Information Theory proper I decided to go back and explore the history of the Alphabet. This reminds us that communication, no matter how fluid it seems, is really just a series of selections. I’m using both Shannon and Harold Innis as inspiration for this series which is why I’m clarifying medium vs. message as well as information transmission over space vs. time – ideas which are popularized by Marshall McLuhan years later. By starting this way I’m able to carefully move away from the semantic issues of information and towards what Shannon called the “engineering problem”. This analogy will carry through the rest of the series so it’s important to lay the groundwork early on.

Lately I have been thinking about ways of blending various aspects of history into math/science lessons on Khan Academy. The traditional model of: lesson, experiment, lesson, experiment makes sense – though it’s important to do the experiment part in a natural way. All experiments begin with observations in the real world. So, I’m going to make a series of short silent videos which reenact observations made by our ancestors and first inventions/technologies which result.

Later on, lessons using modern technology can reference these videos (Karl mentioned we could call them Building Blocks) as experimental foundations everyone can understand. In this case I begin with a simple video of someone finding rocks in a river with seemingly magical properties. Then these properties are harnessed to create new things. This will lead us into electromagnetism, and more modern inventions such as the telegraph. Check out the progress here

This video is an attempt to explain the Prime Number Theorem in a way that gives you a tactile intuition regarding the density of primes. It’s an idea Gauss is famous for having at the age of 16 while studying tables of prime numbers < size (x). The idea for this video came to me while walking in the forest and noting the gradual shift in leaf density as I moved away from the trees. I thought it could be a nice way to introduce density gradient.

More importantly, check out the amazing visualization that Khan Academy user Peter Collingridge made to follow up the video:

I’ll never forget the first time I was introduced to Information Theory. My TA Mike Burrel began a lecture by writing a string of 0’s and 1’s on the board and asked us to think about what it meant. It was followed by a trance-like state of excitement…how did I not hear of this before? Three years later I’m thrilled to be launching an entire episode on the topic. It was a true joy to go back to square one and relearn the topic with a childlike curiosity…My goal is to create a Myst inspired adventure which includes various puzzles along the way.

Figuring out a brief way to explain how & why the RSA encryption algorithm works was a daunting task. My goal was to find a balance between a rigorous 2+ hour technical explanation (for this I’d suggest Dan Boneh’s crypto course) and a simplified intuitive example. I came up… Continue reading →

Check out my interactive exploration of random walks on khanacademy labs.

When someone rolls dice, or selects a card from a shuffled deck the best possible strategy for predicting the outcome can’t beat a blind guess. This is because each outcome is equally likely. When we apply random shifts to our messages it results in a ciphertext which is indistinguishable from any other message – it contains no information. The problem with this method of encryption (one-time pad) is that we must share all the random shifts in advance. What happens when we apply pseudorandom shifts instead? We can relax our definition of perfect secrecy and achieve practical security…