Skip to main content

Count inversions to sort in descending (or ascending) order in Python

I get to think of the Kendall Tau Correlation between predicted cell orders and ground truth cell orders by counting how many swaps of adjacent cells are needed to sort the predicted order into the ground truth oder, after saw the Kagle Competition: Google AI4 Code– Understand Code in Python Notebooks.
The Kendall Tau Correlation is the following formula:
$$K = 1 - 4 \frac{\sum_i S_{i}}{\sum_i n_i(n_i - 1)}$$
where $S_{i}$ is the number of inversions in the predicted and $n_{i}$ is the number of cells.
If you have a list: [3, 2, 1] to be correctly sorted into the ground truth: [1, 2, 3], what the number of swaps is needed?
[3, 2, 1] -> [2, 3, 1] -> [2, 1, 3] -> [1, 2, 3]
So, there you have 3 swaps to get the correct form. The ground truth is a list in ascending order.
In the way to sort in ascending order, an inversion within a sequens A is counted and swaped each adjacent cell into the other when $i < j$ but $A[i] > A[j]$.
def count_inversions(predictions):
    inversions = 0
    size = len(predictions)
    for i in range(size):
        for j in range(i+1, size):
            if predictions[i] > predictions[j]:
                inversions += 1
    return inversions
a = [5, 4, 3, 2, 1]
count_inversions(a)
>>> 10
In the worst case, a list with with n elements will need $\frac{1}{2}n (n - 1)$ inversions. This function has double for-loops: an outer loop and a nested inner loop which iterate through all the elements in the input list. The complexity of this algorithm is quadratic and $O(n^2)$.
In counting invertions it is faster to use Python's built-in bisection function. I extracted the code to use bisection from competition-metric-kendall-tau-correlation
import bisect

def count_inversions_bisect(predictions):
    inversions = 0
    sorted_so_far = []
    for i, u in enumerate(predictions):
            j = bisect.bisect(sorted_so_far, u)
            inversions += i - j
            sorted_so_far.insert(j, u)
    return inversions
a = [5, 4, 3, 2, 1]
inversions = count_inversions_bisect(a)
print(inversions)
>>> 10

Comments