### Formalism

Let \({\mathcal {G}}\) be the ensemble of all simple graphs of *N* vertices. If *P* is a probability distribution over \({\mathcal {G}}\), *P*(*G*) is the probability of graph \(G\in {\mathcal {G}}\), and \(\left\langle \cdot \right\rangle _P\) denotes the expectation with respect to *P*. The vertex set \(V=\{v_i\}_{i=0}^{N-1}\) is partitioned into *n* blocks \(\{B_I\}_{I=0}^{n-1}\). The size of block *I* is \(N_I=|B_I|\) and, for each \(v_i\in V\), \(I_i\) denotes the index of the block to which \(v_i\) belongs. For all pairs *I*, *J*, \(N_{IJ}\) denotes the number of possible pairs (*i*, *j*) with \(v_i\in I\), \(v_j\in J\) and \(i\ne j\), i.e., \(N_{II}=N_I(N_I-1)\) and \(N_{IJ}=N_IN_J\) if \(I\ne J\)—notice that in the case of \(N_{II}\) we are counting twice the number of couples, as we will do for the counts of edges in the following.

Each \(G\in {\mathcal {G}}\) is uniquely determined by its adjacency matrix \(A(G)=\{a_{ij}(G)\}_{i,j=0}^{N-1}\), where \(a_{ij}(G)=1\) if edge \((i,j)\in E(G)\) and \(a_{ij}(G)=0\) otherwise. The degree of vertex \(v_i\) in *G* is \(\deg _i(G)=\sum _j a_{ij}(G)\). The total degree of block *I* is \(\deg _I(G) = \sum _{i\in I} \deg _i(G)\) and, for all *I*, *J*, \(L_{IJ}(G)=\sum _{i\in I}\sum _{j\in J} a_{ij}(G)\) is the number of edges between \(B_I\) and \(B_J\) in *G*, or, if \(I=J\), twice that number. Therefore, using the definitions, \(\deg _I(G)=\sum _J L_{IJ}(G)\). For the sake of simplicity, the dependence of these quantities on the specific graph *G* will be often omitted in the following.

### Maximum entropy Degree-Corrected Block Model^{4}

The maximum entropy Degree-Corrected Block Model (DCBM) is defined as the maximum entropy probability distribution *P* over \({\mathcal {G}}\) in which the number of links per block and the degree sequence are constrained on average, i.e.

$$\begin{aligned} \left\langle L_{IJ}\right\rangle _P&= K_{IJ} \quad \text {for all } I,J \end{aligned}$$

(7)

$$\begin{aligned} \left\langle \deg _i\right\rangle _P&= k_i \quad \text {for all } i, \end{aligned}$$

(8)

with \(\sum _J K_{IJ}=\sum _{i\in I} k_i\), for all *I*. If *H*(*P*) denotes the Shannon entropy of *P*, the sought *P* can be obtained by finding the stationary points of

$$\begin{aligned} H'(P,\vec {\eta },\vec {\theta })=H(P)-C(P,\vec {\eta },\vec {\theta }) = -\left\langle \ln P(G)\right\rangle _P – \left( \sum _{I\le J}\eta _{IJ}\left( \left\langle L_{IJ}\right\rangle _P-K_{IJ}\right) + \sum _i\theta _i\left( \left\langle \deg _i\right\rangle _P-k_i\right) +\alpha (\sum _G P(G)-1)\right) \end{aligned}$$

(9)

where \(\eta _{IJ}\), \(\theta _i\) and \(\alpha \) are Lagrange multipliers: while, \(\eta _{IJ}\) and \(\theta _i\) control the conditions (7) and (8), respectively, \(\alpha \) is necessary for the normalization of the probability *P*(*G*). Since the functional derivatives with respect to *P*(*G*) are \(\frac{\mathrm{d}\left\langle L_{IJ}\right\rangle _P}{\mathrm{d}P(G)}= \sum \nolimits _{i\in I}\sum \limits _{j\in J} a_{ij}(G)\) and \(\frac{\mathrm{d}\left\langle \deg _i\right\rangle _P}{\mathrm{d}P(G)}= \sum \nolimits _j a_{ij}(G)\), then

$$\begin{aligned} \frac{\mathrm{d}C(P,\vec {\eta },\vec {\theta })}{\mathrm{d}P(G)}= \sum _{I\le J}\eta _{IJ}\sum _{i\in I}\sum _{j\in J} a_{ij}(G) + \sum _i \theta _i \sum _j a_{ij}(G) +\alpha = \sum _{i<j}\eta _{I_iJ_j}a_{ij}(G) + \sum _{i<j} (\theta _i + \theta _j) a_{ij}(G)+\alpha \end{aligned}$$

This results in

$$\begin{aligned} P(G)\propto \exp \left[ -\sum _{i<j}(\theta _i+\theta _j+\eta _{I_iJ_j})a_{ij}(G)\right] \end{aligned}$$

and the probability per graph factorises in terms of probabilities per link as

$$\begin{aligned} p_{ij}={\left\{ \begin{array}{ll} \dfrac{x_ix_jy_{I_iJ_j}}{1+x_ix_jy_{I_iJ_j}} &{}\text {if } i\ne j\\ 0 &{}\text {otherwise} \end{array}\right. } \end{aligned}$$

(10)

where \(x_i=e^{-\theta _i}\) and \(y_{IJ}=e^{-\eta _{IJ}}\). For all *i* and all \(I\le J\), \(x_i\) and \(y_{IJ}\) can be found by solving the system of equations

$$\begin{aligned}&\sum _{i\in I}\sum _{j\in J}p_{ij} = K_{IJ} \quad \text {for all } I\le J \end{aligned}$$

(11)

$$\begin{aligned}&\sum _j p_{ij} = k_i \quad \text {for all } i \end{aligned}$$

(12)

#### Sparse DCBM

When the average degree \(\left\langle k\right\rangle \) is small, i.e., when the network is sparse, the edge probability of the DCBM can be approximated as \(p_{ij}\approx x_ix_jy_{I_iJ_j}\). This allows to rewrite (11) and (12) as

$$\begin{aligned} y_{IJ} \sum _{i\in I} x_i \sum _{\begin{array}{c} j\in J\\ j\ne i \end{array}}x_j&= K_{IJ} \quad \text {for all } I\le J \end{aligned}$$

(13)

$$\begin{aligned} x_i \sum _J y_{I_iJ} \sum _{\begin{array}{c} j\in J\\ j\ne i \end{array}}x_j&= k_i \quad \text {for all } i \end{aligned}$$

(14)

Since the constants *K* and \(\vec {k}\) are, by construction, bound by the relation \(\sum _{J} K_{IJ} = \left\langle \deg _I\right\rangle = \sum _{i\in I} k_i\), (13) and (14) admit the following solution

$$\begin{aligned} x_i&= k_i \quad \text {for all } i\\ y_{IJ}&= \frac{K_{IJ}}{\left( \sum _{i\in I}k_i\right) \left( \sum _{\begin{array}{c} j\in J\\ j\ne i \end{array}}k_j\right) } \quad \text {for all } I\le J \end{aligned}$$

### Maximum entropy Fitness-Corrected Block Model

Given a scalar \(p\in (0,1)\), a symmetric matrix \(\Delta \) such that \(\sum _{I,J}\Delta _{I,J}=2\), and a fitness sequence \(\vec {f}\), we define the Fitness-Corrected Block Model (FCBM) as the maximum entropy model fulfilling the following two conditions:

$$\begin{aligned} \left\langle L_{IJ}\right\rangle _P&= p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \Delta _{IJ} \quad \text {for all } I, J \end{aligned}$$

(15)

$$\begin{aligned} \left\langle \deg _i\right\rangle _P&= p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \frac{f_i}{\sum _{u\in I_i} f_u} \sum _J \Delta _{I_iJ} \quad \text {for all } i \end{aligned}$$

(16)

The FCBM is a variation of the DCBM where the network density *p* is a configuration parameter. As in the original stochastic block model, the block-level mixing is specified in terms of a set of edge-densities \(\Delta _{IJ}\), rather than a set of edge-counts \(K_{IJ}\). Similarly, the degree sequence \(\vec {k}\) is replaced by a vertex intrinsic fitness sequence \(\vec {f}\), in line with previous models available in the literature^{5,6}. \(f_i\) measures \(v_i\)’s propensity to establish links and \(\left\langle \deg _i\right\rangle _P\) is set proportional to \(f_i\) by a constant that depends on \(I_i\), other than *p*. By design, (15) and (16) imply \(\sum _J\left\langle L_{IJ}\right\rangle _P = \left\langle \deg _I\right\rangle _P = \sum _{i\in I} \left\langle \deg _i\right\rangle _P\), and (16) can be rewritten as

$$\begin{aligned} \left\langle \deg _i\right\rangle _P = \frac{f_i}{\sum _{u\in I_i} f_u} \left\langle \deg _I\right\rangle _P \end{aligned}$$

which clarifies the role of \(\vec {f}\) as an element of intra-block heterogeneity.

For fixed \(\Delta \) and \(\vec {f}\), the derivation of the maximum entropy FCBM is identical to that of the DCBM: the maximum entropy probability per graph factorizes into the probability per edge given by (10) and the vectors of constants \(\vec {x}\) and \(\vec {y}\) can be obtained by solving the analogous of (11) and (12), i.e.

$$\begin{aligned}&\sum _{i\in I}\sum _{j\in J}p_{ij} = p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \Delta _{IJ} \quad \text {for all } I\le J \end{aligned}$$

(17)

$$\begin{aligned}&\sum _j p_{ij} = p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \frac{f_i}{\sum _{u\in I_i} f_u} \sum _J \Delta _{I_iJ} \quad \text {for all } i \end{aligned}$$

(18)

#### Sparse FCBM

In the sparse-regime, i.e, when \(p\ll 1\), using \(p_{ij}\approx x_ix_jy_{I_iJ_j}\) yields the following approximate solution to (17) and (18):

$$\begin{aligned} x_i&= p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \frac{f_i}{\sum _{u\in I_i} f_u} \sum _J \Delta _{I_iJ} = \frac{f_i}{\sum _{u\in I_i} f_u} \left\langle \deg _I\right\rangle _P \quad \text {for all } i\\ y_{IJ}&= \frac{p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \Delta _{IJ}}{ p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \left( \sum _O \Delta _{IO} \right) p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \left( \sum _O \Delta _{OJ} \right) } = \frac{p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \Delta _{IJ}}{\left\langle \deg _I\right\rangle _P \left\langle \deg _J\right\rangle _P} \quad \text {for all } I\le J \end{aligned}$$

In this regime, the probability per edge can thus be rewritten as

$$\begin{aligned} p_{ij} \approx p\left( {\begin{array}{c}N\\ 2\end{array}}\right) \frac{f_i}{\sum _{u\in I_i} f_u} \frac{f_j}{\sum _{w\in J_j} f_w} \Delta _{I_iJ_j}. \end{aligned}$$

(19)

#### Degree distribution for the sparse FCBM

Let us assume that each fitness value \(f_i\) is drawn from a suitable distribution \(p_f\). The sparse-regime approximation allows to estimate the expected degree distribution of the sampled graph *G* with respect to both \(p_f\) and the graph sampling probability *P*. If \(N_{I_i}\) is large enough, we have \(\sum _{u\in I_i }f_u \approx \left\langle f\right\rangle _{p_f} N_{I_i}\). Now, following the approach used in Ref.^{5}, we have

$$\begin{aligned} k(f,I) = \left\langle \deg _i\bigm \vert f_i=f,I_i=I\right\rangle _{p_f,P} \approx \dfrac{f}{\left\langle f\right\rangle _{p_f}N_{I}}\left\langle \deg _I\right\rangle _P = \dfrac{f}{\left\langle f\right\rangle _{p_f}} \mu _I \end{aligned}$$

(20)

where \(\mu _I = \frac{\left\langle \deg _I\right\rangle _P}{N_I} = \left\langle \deg _i \bigm \vert I_i=I\right\rangle _P\) is the expected average degree of the vertices in *I*. Equation (20) can be inverted leading to estimate

$$\begin{aligned} f(k,I) \approx k \frac{\left\langle f\right\rangle _{p_f}}{\mu _I} \end{aligned}$$

so that

$$\begin{aligned} p_k(k,I) = \Pr [k(f)=k\bigm \vert I] = \Pr [f(k)=f\bigm \vert I]\frac{\mathrm{d}f(k)}{\mathrm{d}k} \approx p_f\left( k \frac{\left\langle f\right\rangle _{p_f}}{\mu _I}\right) \frac{\left\langle f\right\rangle _{p_f}}{\mu _I} \end{aligned}$$

(21)

and, hence,

$$\begin{aligned} p_k(k) = \Pr [k(f)=k] = \sum _I p_k(k,I)\frac{N_I}{N} \approx \sum _I p_f\left( k \frac{\left\langle f\right\rangle _{p_f}}{\mu _I}\right) \frac{\left\langle f\right\rangle _{p_f}}{\mu _I} \frac{N_I}{N} = \frac{\left\langle f\right\rangle _{p_f}}{N} \sum _I p_f\left( k \frac{\left\langle f\right\rangle _{p_f}}{\mu _I}\right) \frac{N_I}{\mu _I} \end{aligned}$$

(22)

A slightly less accurate, yet much simpler, approximation can be obtained computing first

$$\begin{aligned} k(f) = \left\langle \deg _i\bigm \vert f_i=f\right\rangle _{p_f,P} = \sum _I k(f,I) \frac{N_I}{N} \approx \frac{f}{\left\langle f\right\rangle _{p_f}} \sum _I \mu _I \frac{N_I}{N} = \frac{f}{\left\langle f\right\rangle _{p_f}} \mu \end{aligned}$$

(23)

where \(\mu =\sum _I \mu _I \frac{N_I}{N}\) is the average degree of the network. Then, (23) can be inverted as

$$\begin{aligned} f(k) \approx k \frac{\left\langle f\right\rangle _{p_f}}{\mu } \end{aligned}$$

yielding

$$\begin{aligned} p_k(k) = \Pr [k(f)=k] = \Pr [f(k)=f]\frac{\mathrm{d}f(k)}{\mathrm{d}k} \approx p_f\left( k \frac{\left\langle f\right\rangle _{p_f}}{\mu }\right) \frac{\left\langle f\right\rangle _{p_f}}{\mu } \end{aligned}$$

(24)

In many cases, \(p_k\) belongs to the same family of probability distributions of \(p_f\): e.g., if \(p_f\) follows a power-law, lognormal or exponential distribution, then the same holds for \(p_k\).

### Data-driven FCBM for spatial social networks

Our FCBM can be easily tuned upon real data and empirical findings to produce instances of a maximum entropy spatial social network. On one hand, the local density and demographic profile of the population are generally available in the form of discrete, geographically located (e.g., residents in 500 m \(\times \) 500 m tiles) and/or age-stratified (e.g., 0–5 years old) population segments. These data naturally induce a partition of the population into blocks. On the other hand, intra-block population heterogeneity can be controlled by a vertex-related social fitness, possibly modelled upon measurable features such as wealth, employment or mobility.

Formally, let the vertex set *V* describe an age-stratified population of *N* individuals living in a territory tessellated into square tiles of side *l*. Each \(v_i\) is characterized by two data-driven discrete attributes: its tile of residence \(t_i\in T\), that is, the discretized position of \(v_i\) in the territory, and its age-group \(g_i\in \Gamma \). \(t_i\) and \(g_i\) may either be directly available—in the case of a real population—or be drawn, respectively, from given spatial density \(p_t\) and age-distribution \(p_g\)—in the case of a synthetic population. These two attributes induce a partition of the population into \(n=|T|\cdot |\Gamma |\) blocks \(\{B_I\}_{i=0}^{n-1}\), with \(v_i\in B_I=(t_I,g_I)\) if and only if \(t_i=t_I\) and \(g_i=g_I\). We embrace the widely-acknowledged assumption that an inverse-power-law relation binds the distance \(d_{IJ}\) and the frequency of social relations between individuals living in \(t_I\) and \(t_J\)^{12,13}. For all pairs of blocks *I*, *J*, we thus define the edge-density

$$\begin{aligned} \Delta _{IJ} = \frac{d_{IJ}^{-\beta }s_{IJ}}{\sum _{O\le Q}d_{OQ}^{-\beta }s_{OQ}} \end{aligned}$$

where \(d_{IJ}\) is the normalized (geographic or euclidean) distance between tiles \(t_I\) and \(t_J\); the normalization is obtained through a division by \(\frac{l}{2}\). We set \(d_{II}=1\), so that the distance between individuals in the same tile is half the distance of individuals living in neighboring tiles. \(\beta >0\) is a configuration parameter. \(s_{IJ}\) measures the tendency of age groups \(g_I\) and \(g_J\) to socialize with each other; such a \(|\Gamma |\times |\Gamma |\) symmetric age-based social mixing matrix *S* can be obtained, by imposing reciprocity and normalizing, from a suitable data-driven contact matrix^{9}.

Finally, we extract the fitness vector \(\vec {f}\), set the density parameter \(p\in (0,1)\) and, for all pairs *i*, *j*, compute the edge probability \(p_{ij}\) as described for the FCBM. As a result, the graph sampling probability *P* guarantees that: (1) the expected number of links between \(B_I\) and \(B_J\) is proportional to \(s_{I,J}\) and decays as \(d_{I,J}^{-\beta }\); (2) the expected degree of \(v_i\) is proportional to \(f_i\) and to the expected total degree of block \(I_i\).

In this case, the sparse-regime approximation yields

$$\begin{aligned} p_{ij} \approx p \left( {\begin{array}{c}N\\ 2\end{array}}\right) \frac{f_i}{\sum _{u\in I_i} f_u} \frac{f_j}{\sum _{w\in J_j} f_w}\frac{d_{I_iJ_j}^{-\beta }s_{I_iJ_j}}{\sum _{I\le J}d_{IJ}^{-\beta }s_{IJ}} \end{aligned}$$

(25)

Expression (25) defines a phenomenological model, where the probability of two individuals being connected is proportional to their sociability and to the cohesion of their age-groups, while decaying as a power of their distance. Clearly, the estimates obtained in (22) and (24) for the degree distribution stay valid in the data-driven model. If the network is sufficiently sparse and the population of all tiles/groups is sufficiently large, the degree distribution of the sampled graph *G* is controlled by the available data and by the chosen fitness distribution \(p_f\).