An Optimised Group Pairing (Dating) Algorithm

Finding Everyone a Match in CoMatch ❤️

6 min readFeb 23, 2024

The Unutilised Algorithm

Valentine’s Day 2024 has recently just passed, and inevitably, some of us remain without a significant other to celebrate this love-filled occassion with. Every year, the Singapore Society of universities in London, consisting of KCL, ICL, LSE and UCL, organise a matchmaking event to help those without a partner to potentially find a date. Even though the event, aptly named “CoMatch”, is clearly marketed towards finding romantic partners, some participants participate to find merely platonic friendships (or are they? 💭). This year, upon receiving hearsay that an online matching algorithm would be used to match participants, I eargerly registered myself to investigate.

This year’s CoMatch certainly was conceptually interesting. Participants were blindfolded and conversed with other individuals within their group. It was a literal blind date in a speed dating atmosphere! However, to my disappointment, the event coordinators decided not to use the algorithm they found online, due to constrains regarding the number of participants, as well as technical difficulties in the input format (Disclaimer: I do not know the exact reason for the change in procedure. These are my stipulations from hearsay). Instead, participants have to ask each other to match, Single’s Inferno-style, leaving some with no matches, staying in Inferno, while the rest enjoy their time in Paradise.

Admittedly, there was no Inferno or Paradise in CoMatch. Despite that, being left without a match would still leave a sour feeling in one’s mouth, alongside embarrassment from peers. To prevent that from happening in similar future events, let’s write a Python program to match everyone in the group!

The Details

Before writing the code proper, its important to outline the process and parameters in our CoMatching algorithm.

Participants are grouped into groups of size N. For simplicity, N is an even number.
Each participant in the group talks to every other participant in the group.
Each participant ranks every other participant from 1 to N-1, with 1 being most preferred and N-1 being least preferred.
Participants are then optimally sorted into pairs based on how they ranked one another.

With that out of the way, let’s start coding!

The Program

Firstly, the program must be able to take in a form of input which is accessible by anyone without a background in programming. In this case, we use Microsoft Excel, which is fairly intuitive and widely utilised. Ranking results are entered into an Excel file, with the rankers (participants ranking others) as rows and rankees (participants being ranked) as columns:

The Excel input file. Rows: rankers, columns: rankees. Border grids are drawn and cells are coloured for clarity, but it is not necessary for the program to work.

The Pandas package in Python reads the excel file and converts it into a 2-dimensional list.

df = pd.read_excel(sys.argv[1], index_col=[0], header=[0])

After some boring (but necessary) lines of code to ensure that the file has been read properly and that every participant has ranked everyone else correctly, we go on to pair the participants based on their ranking of each other via different methods. In order to do that, we have to create a function which generates all possible arrangements of pairs when given a list of participants (an even number). To do this, the help of StackOverflow has been employed.

def all_pairs(lst):
    # https://stackoverflow.com/questions/5360220/how-to-split-a-list-into-pairs-in-all-possible-ways
    if len(lst) < 2:
        yield []
        return
    if len(lst) % 2 == 1:
        # Handle odd length list (even though this is not necessary)
        for i in range(len(lst)):
            for result in all_pairs(lst[:i] + lst[i+1:]):
                yield result
    else:
        a = lst[0]
        for i in range(1,len(lst)):
            pair = (a,lst[i])
            for rest in all_pairs(lst[1:i]+lst[i+1:]):
                yield [pair] + rest

Then, we attempt to figure out the optimum arrangement using three different methods.

Total Utility

In the Total Utility method, we iterate through each possible arrangement, calculating a score for each arrangement by summing up every participant’s rank of their match in the pair. Since a lower rank number is better (1 being the best), the lowest score out of all the arrangements is the optimum arrangement.

def util(df):
    
    people = list(df.index)
    all_arrangements = []
    for arrangement in all_pairs(people):
        all_arrangements.append(arrangement)

    # since a lower score is better, we start the score at infinity
    highest_score = math.inf
    optimal_arrangement = []

    # iterate over all arrangements, find optimal arrangement based on util score
    for arrangement in all_arrangements:
        score = 0

        # increment score based on how they rank each other
        for pair in arrangement:
            score = score + df[pair[1]][pair[0]]
            score = score + df[pair[0]][pair[1]]

        # update optimal arrangement if lowest score (thus far)
        if score < highest_score:
            highest_score = score
            optimal_arrangement = arrangement

    return optimal_arrangement

2. Least Squares

The Total Utility method exists a nagging problem. As with other solutions for other problems involving maximising utility, the minority might be disadvantaged as consequence. In this case, two participants might be matched with each other even though they have ranked each other rather badly, just because the other participants in the group gets paired up perfectly as a result. To remedy this, the Least Squares approach sums up each participants rank of their match, squared, inflating large values. This discourages the algorithm from matching pairs of participants that hates each other with a vengence.

def leastsq(df):
    
    people = list(df.index)
    all_arrangements = []
    for arrangement in all_pairs(people):
        all_arrangements.append(arrangement)
    
    # since a lower score is better, we start the score at infinity
    highest_score = math.inf
    optimal_arrangement = []

    # iterate over all arrangements, find optimal arrangement based on squared score
    for arrangement in all_arrangements:
        score = 0

        # increment score based on how they rank each other, squared
        for pair in arrangement:
            score = score + (df[pair[1]][pair[0]]) ** 2
            score = score + (df[pair[0]][pair[1]]) ** 2

        # update optimal arrangement if lowest score (thus far)
        if score < highest_score:
            highest_score = score
            optimal_arrangement = arrangement

    return optimal_arrangement

3. Threshold

Last but not least, Threshold iterates over all arrangements, finding arrangements where all participants do not rank their match above a certain threshold. The threshold starts strict, at 1, and when no possible arrangements are found, it slowly increments itself by 1 and terminates when a possible arrangement is found. If more than one possible arrangement are found, Threshold utilises Total Utility to determine the optimal arrangement. Threshold is an algorithm that aims to make everyone happy, in a way that is “fair”. However, while doing so, total utility might not be maximised.

def threshold(df):
    
    people = list(df.index)
    all_arrangements = []
    for arrangement in all_pairs(people):
        all_arrangements.append(arrangement)

    # start at a stringent threshold (1)
    T = 1
    possible_arrangements = []

    while True:
        # iterate over all arrangements, find arrangement that meets threshold
        for arrangement in all_arrangements:
            threshold_met = True

            for pair in arrangement:
                if df[pair[1]][pair[0]] > T or df[pair[0]][pair[1]] > T:
                    threshold_met = False
                    break
            
            # if threshold met, then add to list or possible arrangements
            if threshold_met == True:
                possible_arrangements.append(arrangement)

        # if possible arrangement found, break out of loop
        if possible_arrangements:
            break
        # else, make threshold more relaxed (increment by 1)
        else:
            T += 1
            continue

    if len(possible_arrangements) == 1:
        return possible_arrangements
    
    # if more than 1 possible arrangements, use util to determine optimal arrangement
    elif len(possible_arrangements) > 1:
        
        highest_score = math.inf
        optimal_arrangement = []

        for arrangement in possible_arrangements:
            score = 0

            # increment score based on how they rank each other
            for pair in arrangement:
                score = score + (df[pair[1]][pair[0]])
                score = score + (df[pair[0]][pair[1]])

            # update optimal arrangement if lowest score (thus far)
            if score < highest_score:
                highest_score = score
                optimal_arrangement = arrangement

        return optimal_arrangement

Testing The Code

Now, let’s test the code. We run the program using the the command line interface, not forgetting to add the Excel file name and the method we want to use. For this test, we will use the same Excel file mentioned above, and match participants via the Least Squares method.

python match.py test.xlsx leastsq

We get the output:

(‘person1’, ‘person2’)
(‘person3’, ‘person4’)

We successfully paired everyone in the group! Now all that’s left to do is let the participants go and enjoy their Valentine’s Day date. Happy Valentine’s Day everyone!