Ruby Array Set Intersection: A Rails Use-Case and Algorithm Analysis in C

Here’s some smelly code I wrote for the Flatiron School pre-work progress-tracker app.

Brittle Methods

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Progress < ActiveRecord::Base
  belongs_to :topic
  serialize :completed_lessons, Array

  def topic_complete?
    topic.total_lessons == completed_lessons.length
  end
end

class Topic < ActiveRecord::Base
  has_many :progresses
  serialize :lesson_order, Array

  def add_lesson(lesson)
    lesson_order << lesson.id
    total_lessons += 1
  end
end

add_lesson is brittle. A new coder on this project has to know to add lessons using this method; the design is therefore NOT transparent. There is also a duplication of data: calling lesson_order.length gives you the total number of lessons, making a total_lessons attribute a violation of the DRY principle. It should be a method instead.

topic_complete? also felt prone to false positives and negatives. If an admin replaces a lesson, is a completed progress still complete? What about if a lesson is removed? Our method would return false.

I needed a way to actually compare what was in completed_lessons and lesson_order.

Set Theory in Full Motion

Set theory review: The intersection of two sets is the set of all common elements. Ruby’s Array class has such a method which can be used like an operator: &.

The & operator is called the set intersection operator.

how-&-works
1
[ 1, 1, 3, 5 ] & [ 1, 2, 3 ]    #=> [ 1, 3 ]

Failed Solution, Learned Lesson

1
2
3
def topic_complete?
  completed_lessons & topic.lesson_order == topic.lesson_order
end

This failed because the order in which the comparison happens matters.

order-matters
1
2
3
4
5
6
7
8
9
10
11
lesson_order = [3, 2, 5]
completed_lessons = [5, 2, 3]

lesson_order & completed_lessons
   # => [3, 2, 5]

completed_lessons & lesson_order
   # => [5, 2, 3]

completed_lessons & lesson_order == lesson_order
   # => false

Win

1
2
3
def topic_complete?
  topic.lesson_order & completed_lessons == topic.lesson_order
end

BONUS: The Algorithm Behind the Set Intersection Operator

I took a a deep dive into Ruby source code to figure this one out. Full disclosure: I’ve never read or written C in my life, but with the help of the MRI Identifier Search, I was able to piece together how the algorith works.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 static VALUE
rb_ary_and(VALUE ary1, VALUE ary2)
{
    VALUE hash, ary3, v;
    st_table *table;
    st_data_t vv;
    long i;

    ary2 = to_ary(ary2);
    ary3 = rb_ary_new();
    if (RARRAY_LEN(ary2) == 0) return ary3;
    hash = ary_make_hash(ary2);
    table = rb_hash_tbl_raw(hash);

    for (i=0; i<RARRAY_LEN(ary1); i++) {
        v = RARRAY_AREF(ary1, i);
        vv = (st_data_t)v;
        if (st_delete(table, &vv, 0)) {
            rb_ary_push(ary3, v);
        }
    }
    ary_recycle_hash(hash);

    return ary3;
}

The key to understanding this is in the for loop.

  • ary1 and ary2 represent the arrays for which we’re finding the intersection. ary3 is what the function rb_ary_and returns.
  • ary2 is turned into a hash table so that elements from ary1 can be easily matched up.
  • v represents ary1[i], in Ruby-speak.
  • vv is somehow tied to that value.
  • st_delete returns 1 if vv was successfully deleted from table, which really refers to ary2, and 0 if it wasn’t found. If it gets deleted, then that means ary2 must have had an element in common; push that element into ary3.

O(x+y)

In the worst-case scenario, the algorithm iterates through ary2 to create a table, giving us O(x) for x elements in ary2. Then, given there are y elements in ary1, we iterate through ary1 y times and look for the value in the hash table, giving us O(y). Therefore, in total, the algorithm has a big-o of O(x+y).

Comments