Advanced Python Data Science: Monte Carlo Simulations | Juni Learning

POSTED ON APRIL 10, 2020

The value of pi, found through Monte Carlo simulations and written on a whiteboard.

Subscribe to our newsletter

We’ll send you our favorite blog posts, tutorials, events, and more!

Your Email

Introduction

In this lab, Juni instructor Ritika will be teaching us how to use Monte Carlo simulations to determine the value of π. Pi (π) is a mathematical constant with a value of roughly 22/7 or 3.14159.... Most students are familiar with π through finding the area of a circle.

Learn more about what Monte Carlo simulations are, how to translate mathematical expressions to code, using data from simulations to create a scatter plot, and reading scatter plots to understand outcomes.

Project Overview

Who is this project for?

  • Language: Python

  • Juni Level: Python Level 3

  • Recommended Coding Experience: Advanced coders (3+ years of coding experience). Experience in Python, MATLAB, or R is not necessary but preferred.

  • Core concepts: Nested loops, Using Dataframes, Dictionaries, Lists, Creating Scatter Plots

  • Lines of Code: ~40

  • Difficulty: Hard

  • Est. Time: ~1hr, but this varies by student!

What should I know before this project?

  • How to create a dataframe using the python package pandas

  • How to create a scatter plot using matplotlib.pyplot

Learning Outcomes

After completing this project, the student should:

  • Know what Monte Carlo simulations are

  • Understand how we can translate mathematical expressions to code

  • Use data from simulations to create a scatter plot

  • Read a scatter plot to understand the outcome of the Monte Carlo simulation

Get started below!

Monte Carlo Simulations: What are they?

Simply put, Monte Carlo simulations use random sampling to obtain numerical results. For example, if we are given an unfair coin and we want to determine the probability of heads and tails, we can use random sampling to obtain a result. Random sampling is the process of randomly selecting a sample to represent an entire population.

In this case, we can use a sample size of perhaps 1,000 coin flips. Every flip, we can record whether the coin landed on heads or tails. After all of the trials, we can record the percentage of heads and tails we obtained to represent the probability of the coin landing on heads or tails on any given flip.

Behind the Scenes

Dartboards and Probability

In this project, we will be using Monte Carlo simulations to determine the value of π. Let's think about this conceptually first. Let’s pretend we have a dartboard. This dartboard has a unique design:

dartboard

As we can see, This dartboard is in the shape of a square. Inside the square part, there is a shaded circle which is centered and tangent to each side of the square. The circle has a radius of r and the square has a side length of 2r.

Let’s say we have a dart thrower who always manages to hit the board and is equally likely to hit any area of the board -- what’s the probability that the dart will hit the shaded circle (p(hit))? In order to solve this, let’s break it down:

probability that the dart will hit the shaded circle

We know the dart player always hit the dart board (within the square) so this probability can also be expressed as:

Another way to express probability that the dart will hit the shaded circle

Using the radius of the circle and the side length of the square, we can finally reduce this to:

Reduced probability that the dart will hit the shaded circle

And so,

Probability rewritten

Using what we initially know about p(hit), we can finally rewrite this as:

Replacing p(hit)

TIP: Try doing the math above by yourself. It’ll help reinforce the concepts we used to get to the final step!

Coding Overview

Let’s talk about the left hand side of the equation. This is what we want to code.

Let’s map out the steps in order to simulate this project:

  1. Write out the code to simulate one dart being thrown on the board. We’ll use 1 to represent hitting the circle and 0 to represent a miss. Because this setup is a little tricky, the starter code is below for you to begin.

  2. Repeat this code for 1,000 darts and make sure to record the hits or misses (1s and 0s). What data structure can we use for this? (HINT 1)

  3. With the simulation we have now, repeat these steps for a trial consisting of 1 dart, 2 darts, …, 1000 darts. Make sure to record the total number of hits for each trial. What other data structure can we use for this? (HINT 2)

  4. Using the formula we’ve derived, tweak your code to fit the expression above. This will be the estimated value of pi.

  5. Now you should have a data structure which stores each trial, and the estimated value of pi from that trial. Can we use this data to create a scatter plot? Using pandas, create a dataframe using your data and create a scatter plot. The x-axis should be “Number of Trials” and the y-axis should be “Estimated Value of Pi”.

  6. Create a horizontal line at y=3.14. We’ll use this to compare our results with the true value of pi.

Guiding Questions

  1. What do you notice about the shape of your graph as the number of trials increases? Why do you think this happens?

  2. Can you think of another situation where Monte Carlo simulations may be helpful for us?

  3. What are the pros and cons of using Monte Carlo simulations?

Code Starter

Throwing One Dart

This is the code which simulates one dart being thrown on the board. For the sake of simplicity, we will assume that our circle has a radius of 0.5, centered at (0,0). This is saved in r. We will now use random to pick a random float. random.random() picks a random float between 0 and 1.

Because our board only extends to -0.5 and 0.5 in both the x and y direction, we’ll subtract 0.5 from our random floats in order to make sure the dart always lands on the board.

Now we need to figure out if the dart has landed on the circular part of the board. Remember, the radius of the circle is 0.5. This means that the distance between the perimeter of the circle of the center is 0.5. Thus, if the distance between the coordinate (x,y) and (0,0) is greater than 0.5, the dart has landed outside the circle.

The following code performs the Euclidean distance formula (one which is often taught in trigonometry and geometry classes). If the distance is less than 0.5, or r, we have a hit (1). Otherwise, we have a miss (0).

Euclidean distance formula

Creating a Line for Pi

This line of code is used to create a horizontal line at y =3.14.

horizontal line at y =3.14

Hints

  1. A list. Your list should consist of 1’s and 0’s depending on if the dart resulted in a hit or miss. The length of the list should be equal to the number of trials conducted. In this case, that’s the number of darts thrown.

  2. A dictionary. Your dictionary should look like {1:(estimated value of pi for 1 trial), 2:(estimated value of pi for 2 trials), 3:(estimated value of pi for 3 trials)...1000:(estimated value of pi for 1000 trials)}

Try to do the lab on your own first! If you get stuck or want to check your solution, check out Ritika’s solution code below.

Project Solution

Check your answers or get help by viewing Ritika's lab solution code.

Want to keep learning?

We hope you enjoyed this project tutorial!

To keep practicing and learning, please check out all of our coding tutorials on our blog.

Need help?

Looking up your coding questions is one of the best ways to learn!

Another great way to learn is from an experienced coder or instructor. Juni CS instructors like Ritika work closely with students ages 8-18, and are specially trained to adapt to each child's unique learning style, pace, and interests.

Read more about our coding courses and curriculum, or get started with our Admissions Team to learn which course is best for your child’s coding journey. You can also read more about how we use Scratch to teach coding.