Count inversions to sort in descending (or ascending) order in Python

I get to think of the Kendall Tau Correlation between predicted cell orders and ground truth cell orders by counting how many swaps of adjacent cells are needed to sort the predicted order into the ground truth oder, after saw the Kagle Competition: Google AI4 Code– Understand Code in Python Notebooks.
The Kendall Tau Correlation is the following formula:
$$K = 1 - 4 \frac{\sum_i S_{i}}{\sum_i n_i(n_i - 1)}$$
where $S_{i}$ is the number of inversions in the predicted and $n_{i}$ is the number of cells.
If you have a list: [3, 2, 1] to be correctly sorted into the ground truth: [1, 2, 3], what the number of swaps is needed?
[3, 2, 1] -> [2, 3, 1] -> [2, 1, 3] -> [1, 2, 3]
So, there you have 3 swaps to get the correct form. The ground truth is a list in ascending order.
In the way to sort in ascending order, an inversion within a sequens A is counted and swaped each adjacent cell into the other when $i < j$ but $A[i] > A[j]$.

def count_inversions(predictions):
    inversions = 0
    size = len(predictions)
    for i in range(size):
        for j in range(i+1, size):
            if predictions[i] > predictions[j]:
                inversions += 1
    return inversions

a = [5, 4, 3, 2, 1]
count_inversions(a)
>>> 10

In the worst case, a list with with n elements will need $\frac{1}{2}n (n - 1)$ inversions. This function has double for-loops: an outer loop and a nested inner loop which iterate through all the elements in the input list. The complexity of this algorithm is quadratic and $O(n^2)$.
In counting invertions it is faster to use Python's built-in bisection function. I extracted the code to use bisection from competition-metric-kendall-tau-correlation

import bisect

def count_inversions_bisect(predictions):
    inversions = 0
    sorted_so_far = []
    for i, u in enumerate(predictions):
            j = bisect.bisect(sorted_so_far, u)
            inversions += i - j
            sorted_so_far.insert(j, u)
    return inversions

a = [5, 4, 3, 2, 1]
inversions = count_inversions_bisect(a)
print(inversions)
>>> 10

MeePythoner

Search This Blog

Count inversions to sort in descending (or ascending) order in Python

Labels

Comments

Post a Comment

Popular posts from this blog

Schemdraw AttributeError: 'Text' object has no property 'math_fontfamily'

Iterator object in Python

Convert string to datetime in Pandas DataFrame, python

scipy.optimize.curve_fit() in Python

DOT Language with Graphviz to render in Pyhton

Downsampling using convolutional neural networks

Concatenate strings from two or more columns into one column in pandas.DataFrame