Ruby Array Set Intersection: A Rails Use-Case and Algorithm Analysis in C
Here’s some smelly code I wrote for the Flatiron School pre-work progress-tracker app.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
add_lesson is brittle. A new coder on this project has to know to add lessons using this method; the design is therefore NOT transparent. There is also a duplication of data: calling
lesson_order.length gives you the total number of lessons, making a
total_lessons attribute a violation of the DRY principle. It should be a method instead.
topic_complete? also felt prone to false positives and negatives. If an admin replaces a lesson, is a completed progress still complete? What about if a lesson is removed? Our method would return false.
I needed a way to actually compare what was in
Set Theory in Full Motion
Set theory review: The intersection of two sets is the set of all common elements. Ruby’s Array class has such a method which can be used like an operator:
The & operator is called the set intersection operator.
Failed Solution, Learned Lesson
1 2 3
This failed because the order in which the comparison happens matters.
1 2 3 4 5 6 7 8 9 10 11
1 2 3
BONUS: The Algorithm Behind the Set Intersection Operator
I took a a deep dive into Ruby source code to figure this one out. Full disclosure: I’ve never read or written C in my life, but with the help of the MRI Identifier Search, I was able to piece together how the algorith works.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
The key to understanding this is in the
- ary1 and ary2 represent the arrays for which we’re finding the intersection. ary3 is what the function
- ary2 is turned into a hash table so that elements from ary1 can be easily matched up.
- v represents
ary1[i], in Ruby-speak.
- vv is somehow tied to that value.
st_deletereturns 1 if
vvwas successfully deleted from
table, which really refers to ary2, and 0 if it wasn’t found. If it gets deleted, then that means ary2 must have had an element in common; push that element into ary3.
In the worst-case scenario, the algorithm iterates through ary2 to create a table, giving us O(x) for x elements in ary2. Then, given there are y elements in ary1, we iterate through ary1 y times and look for the value in the hash table, giving us O(y). Therefore, in total, the algorithm has a big-o of O(x+y).