SpeakEasy Algorithm

SpeakEasy is a label propagation algorithm related to Speaker-listener Label Propagation Algorithm (SLPA), but it has been adapted to perform well on common types of biological data.

The outline of the algorithm is presented in the following pseudocode:

More...

In summary, initially each node is assigned a random unique label. Then for some small number of iterations (usually less than 30), each node updates its status to the label found among nodes connected to it which has the greatest specificity, i.e. the label with the greatest difference between the actual and the expected frequency. Positively or negatively-weighted links between nodes (often produced when clustering correlation-based networks) are easily incorporated into SpeakEasy, as they provide relative increases or decreases in the popularity of a particular label. The label updating step is performed simultaneously for all nodes. Although there is the potential for oscillating states to emerge with a simultaneous update step, in practice this is not observed in SpeakEasy. Cluster accuracy improves when labels from the last several time-steps are included in the calculation of expected and actual labels. However, initially the network has no history of labels, so we create an artificial buffer of random neighboring labels. This buffer prevents the algorithm from becoming trapped in an early equilibrium, and also provides unique initial conditions, which are useful when clustering the same dataset multiple times. More...