Big O Notation

Big O notation might look something like this: O(log N)

N = number of input data
O( ) = function notation for Big O (just tells you we're using Big O notation)
log N = the function that describes the efficiency of the procedure or algorithm as N grows bigger and bigger

Programmers use Big O Notation as a way to express how efficient an algorithm or procedure is at performing its designated task (e.g. sorting, searching). The function tells us the maximum number of “actions” it will take for the algorithm to achieve its goal, given N data.

Example 1: O(N)

O(N) is the linear case. Given 10 data, the worst-case scenario is that it will take 10 “actions” to search for a target, or sort the data, or whatever the goal of the algorithm is. Given 100 data, it will take 100 actions. Given 2000 data, it will take 2000 actions. Given N data, it will take N actions.

A real life example would be if you were looking for a book on your friend’s bookshelf.

Lets say you want your program to find any given book. You could tell the program to start checking books one by one from the left. If you wanted Harry Potter 3, it would only take 3 actions (or tries) because it’s the third book in. However, Big O tells us the worst-case scenario. What book would cause the algorithm to take the most possible actions? Harry Potter 7 — it’s the last book so the algorithm would have to run all 7 times. Given 7 books, it took the algorithm 7 actions / tries to find the book in the worst case scenario.

What if we ran this algorithm on 10 books? The worst case would take all 10 tries. How about 1,000,000 books? It would take 1 million tries.

So, TL;DR — O(N) tells us that given N data, it will take an algorithm N actions / tries to accomplish a task.

Example 2: O(1)

What does O(1) tell us? The function inside is simply “1”. This means that no matter what number N is, it will always accomplish the task in 1 try. It represents the most efficient algorithm possible.

Does this exist? Sure it does! But it might not give you exactly what you’re looking for. Here’s an algorithm that can be described with O(1).

def efficient_but_useless
  return true

TL;DR algorithms with O(1) always finish in 1 try.

Example 3: O(N2) or any higher exponent

Is this an efficient algorithm? In other words, if I input a large amount of data for N, what does that tell me about the algorithm? If I input N=1, then no matter what, it will take 1 try. But if I put in N=10, then it will take 102 = 100 actions / tries to complete the algorithm (worst case scenario). Imagine if the exponent were higher, or if N were higher. Then the algorithm would be super inefficient!

What kinds of algorithms would have O(N2)? Simply put, algorithms with nested loops usually exhibit this behavior. If you have to run a procedure on each item, and each item has to check itself against every other item in some way, then worst case is that you run the procedure N2 times. In other words, it takes N2 actions.

Example 4: O(log N)

This time the function is logarithmic.

Imagine that the x-axis represents N, how much input data you have, and the y-axis tells us how many actions or tries it takes the algorithm to finish (worst case scenario, as usual). As N grows larger, the number of tries grows, but slowly.

Let’s try calculating a few examples. Keep in mind that when programmers write log N, it refers to log base 2. Also, since the number of actions has to be a whole number, if you get a decimal, the rule of thumb is to round up. So given 10 data, it will take log(10) = 3.32 which rounds up to 4 actions, max. This isn’t that much more efficient than O(N), but what if N=1,000,000? It would take your algorithm log(1000000) = 19.93 which rounds up to 20 tries, max. For 1000000 data, that’s pretty efficient!

When would we use a logarithmic function to describe the efficiency of an algorithm? One example is the binary search. The binary search algorithm finds a target value within a set of sorted data by repeating the process below until you’re left with one value.

  1. Find the middle value. If there is more than one, take the higher value.
  2. Compare the target value with the middle value. If it’s equal, then we’ve found the target. Otherwise, if the target > middle value, then eliminate all values to the left of the middle (lower than the middle); we know the target is in the half with greater values. If the target is <= middle, eliminate the right.
  3. Repeat steps 1 and 2 on the remaining side. Continue until you are left with one value; this is your target.

So let’s say this is your data, and you’re looking for the value, 4.

data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
target = 4
  1. Find the middle value. There are 10 values, thus two middles. Take the higher: 5.
middle = 5
[0, 1, 2, 3, 4]; [5, 6, 7, 8, 9]

# left side = [0, 1, 2, 3, 4]
# right side = [5, 6, 7, 8, 9]
  1. Compare target with middle. Is it equal to 4? No. Since target <= middle (4 <= 5), eliminate the right side.

  2. Repeat steps 1 and 2 on the left side. We’ll keep doing this until we find the target value, 4.

middle = 2
[0, 1, 2, 3, 4]

# left side = [0, 1]
# right side = [2, 3, 4]

Check: is target = middle? No. So onto step 2, compare: target > middle (4 > 2). Use the right side.

middle = 3
[2, 3, 4]

# left side = [2]
# right side = [3, 4]

target = middle? No So compare again. target > middle (4 > 3). Use the right side.

middle = 4  # note, we always take the higher of two as the middle for an even set.
[3, 4]

# left side = [3]
# right side = [4]

Target = middle? Yes. We’ve found the target and it took 4 tries!

Now let’s make sense of this as a logarithmic function: O(log N). We saw above that the calculations work, but why is it log base 2, and why log? If you think of logs as the inverse of exponents, and exponents as repeated multiplication, then logs are repeated division. Where are we dividing in the above procedure? Every step! We’re cutting each set in half over and over again until we reach the target, hence repeated division. Why is it log base 2? Log base 2 simply means that we keep dividing by 2, which makes sense because we’re cutting the data set in half.

I chose the target number 4 in the binary search method above because it represents the worst case scenario for a set of 10 data. It took 4 tries / actions, and log(10), again with an assumed base 2 since we’re in the world of programming, gives us 3.32 which rounds up to 4. Tada!

If you want to see the binary search method or linear methods in action, here’s a rather straight-forward video demonstrating what I showed above.

Conclusion and Resources

I hope that this post helps clarify big O notation for those of you who are brand new to the idea. A few other resources I feel are really well-done: this visual, real-world explanation and this much more technical explanation. I didn’t include code in this post because these algorithms are all solved problems already and with some Google sleuthing, you will repos full of them in a multitude of languages. One more resource I like is the big o cheat-sheet.

Happy coding!