NetworK Reduction Algorithm - Disparity filter

2019-04-11

The thing that network reduction algorithm does is to extract the backbone structure of undirected weighted network. There are different algorithms to e.g. k-core decomposition, minimum spanning tree and disparity filter. The main idea is to reduce the density to make the matrix or the graph more sparse.

Disparity filter

The disparity filter algorithm depends on the statistical null model.
The steps are as follows:

Create the null model of normalised weighted distribution
- we define the strength of a node $i$ is $s_i = \sum_jw_{ij}$, where $w_{ij}$ is the weight of link between i and j.
- The null model needs to be normalised to make sure that the nodes with low strength will not be biased. A normalized weight $p_{ij}$ is defined as $p_{ij} = w_{ij}/s_i$
- Distribute the strength to nodes randomly: the normalized weights of a certain node with degree k is generated like this: k − 1 pins are randomly assigned between the interval 0 and 1. The interval is divided into k subintervals. The length of the subinterval represents the normalized weight of each link in the null model.
- Based on the above null model. we can derive that the normalized weight distribution of a node with degree k follows $\rho (x)\,dx=(k-1)(1-x)^{ {k-2} }\,dx$
Disparity filter
- For a given normalized weight $p_{ij}$, the p-value $α{ij}$ of $p{ij}$ based on the null model is given by $\alpha _{ {ij} }=1-(k-1)\int {0}^{ {p{ {ij} } } }(1-x)^{ {k-2} }\,dx$ which reduces to ${\displaystyle \alpha {ij}=(1-p{ij})^{k-1} }$.
- The meaning of $α{ij}$ is the probability of having normalized weight larger or equal to $p{ij}$ in the framework of the given null model. By setting a significance level α (between 0 and 1), for any link of normalized weight $p_{ij}$, if $α_{ij}$ is larger than α, it will be filtered out (set as 0).

MATLAB code:

w_thresh = disparity_func(s)
    w=s;   % s is the input matrix (e.g. similarity matrix, connectivity matrix)
    w_null=zeros(length(w),length(w));
    for j=1:length(w) % column
        w_null(:,j)=w(:,j)/sum(w(:,j)); % compute the node strength for each 
    end
    ind_upper=find(triu(ones(length(w),length(w)),1));
    
    fprintf('Find the smallest p-value needed for graph to remain connected...\n');
    dns=linspace(0,1,200); % P-value
    for i=1:length(dns)
        w_thresh=s;                 % Initialize the w_thresh every time
        for j=1:length(ind_upper)
            [ii,jj]=ind2sub(size(w),ind_upper(j));
            if (1-w_null(ii,jj))^(K-1)>dns(i) && (1-w_null(jj,ii))^(K-1)>dns(i)
                w_thresh(ii,jj)=0; w_thresh(jj,ii)=0;
            end
        end
        [~,comp_sizes]=get_components(~~w_thresh);
        if length(comp_sizes)==1
            break
        end
    end

Reference:
wiki