Viet-Anh on Software Logo

What is: Spectral Gap Rewiring Layer?

SourceDiffWire: Inductive Graph Rewiring via the Lovász Bound
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

TL;DR: GAP-Layer is a GNN Layer which is able to rewire a graph in an inductive an parameter-free way optimizing the spectral gap (minimizing or maximizing the bottleneck size), learning a differentiable way to compute the Fiedler vector and the Fiedler value of the graph.

Summary

GAP-Layer is a rewiring layer based on minimizing or maximizing the spectral gap (or graph bottleneck size) in an inductive way. Depending on the mining task we want to perform in our graph, we would like to maximize or minimize the size of the bottleneck, aiming to more connected or more separated communities.

GAP-Layer: Spectral Gap Rewiring

Loss and derivatives using L\mathbf{L} or L\mathbf{\cal L}

For this explanation, we are going to suppose we want to minimize the spectral gap, i.e. make the graph bottleneck size smaller. For minimizing the spectral GAP we minimize this loss:

L_Fiedler=A~A_F+α(λ_2)2L\_{Fiedler} = \|\tilde{\mathbf{A}}-\mathbf{A}\| \_F + \alpha(\lambda\_2)^2

The gradients of this cost function w.r.t each element of A\mathbf{A} are not trivial. Depending on if we use the Laplacian, L\mathbf{L}, or the normalized Laplacian, L\cal L, the derivatives are going to be different. For the former case (L\mathbf{L}), we will use the derivatives presented in Kang et al. 2019. In the latter scenario (L\cal L), we present the Spectral Gradients: derivatives from the spectral gap w.r.t. the Normalized Laplacian. However, whatever option we choose, λ2\lambda_2 can seen as a function of A~\tilde{\mathbf{A}} and , hence, _A~λ_2\nabla\_{\tilde{\mathbf{A}}}\lambda\_2, the gradient of λ_2\lambda\_2 wrt each component of A~\tilde{\mathbf{A}} (how does the bottleneck change with each change in our graph?), comes from the chain rule of the matrix derivative Tr[(_L~λ_2)T_A~L~]Tr\left[\left(\nabla\_{\tilde{\mathbf{L}}}\lambda\_2\right)^T\cdot\nabla\_{\tilde{\mathbf{A}}}\tilde{\mathbf{L}}\right] if using the Laplacian or Tr[(_L~λ_2)T_A~L~]Tr\left[\left(\nabla\_{\tilde{\mathbf{\cal L}}}\lambda\_2\right)^T\cdot\nabla\_{\tilde{\mathbf{A}}}\tilde{\mathbf{\cal L}}\right] if using the normalized Laplacian. Both of this derivatives, relies on the Fiedler vector (2nd eigenvector: f_2\mathbf{f}\_2 if we use L\mathbf{L} and g_2\mathbf{g}\_2 if using L\mathbf{\cal L} instead). For more details on those derivatives, and for the sake of simplicity in this blog explanation, I suggest go to the original paper.

Differentiable approximation of f2\mathbf{f}_2 and λ2\lambda_2

Once we have those derivatives, the problem is still not that trivial. Note that our cost function L_FiedlerL\_{Fiedler}, relies on an eigenvalue λ_2\lambda\_2. In addition, the derivatives also depends on the Fiedler vector f_2\mathbf{f}\_2 or g_2\mathbf{g}\_2, which is the eigenvector corresponding to the aforementioned eigenvalue. However, we DO NOT COMPUTE IT SPECTRALLY, as its computation has a complexity of O(n3)O(n^3) and would need to be computed in every learning iteration. Instead, we learn an approximation of f_2\mathbf{f}\_2 and use its Dirichlet energy E(f_2){\cal E}(\mathbf{f}\_2) to approximate the λ2\lambda_2.

f_2(u)=+1/nif    u    belongs to the first cluster1/nif    u    belongs to the second cluster\mathbf{f}\_2(u) = \begin{array}{cl} +1/\sqrt{n} & \text{if}\;\; u\;\; \text{belongs to the first cluster} \\ -1/\sqrt{n} & \text{if}\;\; u\;\; \text{belongs to the second cluster} \end{array}

In addition, if using L\mathbf{\cal L}, since g_2=D1/2f2\mathbf{g}\_2=\mathbf{D}^{1/2}\mathbf{f}_2, we first approximate g2\mathbf{g}_2 and then approximate λ2\lambda_2 from E(g_2){\cal E}(\mathbf{g}\_2). With this approximation, we can easily compute the node belonging to each cluster with a simple MLP. In addition, such as the Fiedler value must satisfy orthogonality and normality, restrictions must be added to that MLP Clustering.

GAP-Layer

To sum up, GAP-Layer can be defined as the following. Given the matrix X_n×F\mathbf{X}\_{n\times F} encoding the features of the nodes after any message passing (MP) layer, S_n×2=Softmax(MLP(X))\mathbf{S}\_{n\times 2}=\textrm{Softmax}(\textrm{MLP}(\mathbf{X})) learns the association XS\mathbf{X}\rightarrow \mathbf{S} while S\mathbf{S} is optimized according to the loss:

L_Cut=Tr[STAS]Tr[STDS]+STSSTS_FI_n2_FL\_{Cut} = -\frac{Tr[\mathbf{S}^T\mathbf{A}\mathbf{S}]}{Tr[\mathbf{S}^T\mathbf{D}\mathbf{S}]} + \left\|\frac{\mathbf{S}^T\mathbf{S}}{\|\mathbf{S}^T\mathbf{S}\|\_F} - \frac{\mathbf{I}\_n}{\sqrt{2}}\right\|\_F

Then, the f_2\mathbf{f}\_2 is approximated from S\mathbf{S} using f_2(u)\mathbf{f}\_2(u) equation. Once calculated f_2\mathbf{f}\_2 and λ_2\lambda\_2 we consider the loss:

L_Fiedler=A~A_F+α(λ_2)2L\_{Fiedler} = \|\tilde{\mathbf{A}}-\mathbf{A}\|\_F + \alpha(\lambda\_2)^2

A~=AμA~λ_2\mathbf{\tilde{A}} = \mathbf{A} - \mu \nabla_\mathbf{\tilde{A}}\lambda\_2 returning A~\tilde{\mathbf{A}}. Then the GAP diffusion TGAP=A~(S)A\mathbf{T}^{GAP} = \tilde{\mathbf{A}}(\mathbf{S}) \odot \mathbf{A} results from minimizing

LGAP=L_Cut+L_FiedlerL_{GAP}= L\_{Cut} + L\_{Fiedler}

References (Kang et al. 2019) Kang, J., & Tong, H. (2019, November). N2n: Network derivative mining. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 861-870).