Sci2 Manual : 4.10 Modeling (Why?)
This page last changed on Mar 28, 2011 by dapolley.
Data models are grouped into two major types: descriptive models and process models. Descriptive models aim to illustrate the major features of a (typically static) data set, such as statistical patterns of article citation counts, networks of citations, individual differences in citation practice, the composition of knowledge domains, or the identification of research fronts as indicated by new yet highly cited papers. Process models or predictive models aim to simulate, statistically describe, or formally reproduce statistical and dynamic characteristics of interest. 4.10.1 Random Graph ModelThe random graph model generates a graph that has a fixed number of nodes which are connected randomly by undirected edges, see Figure 4.17 (left). The number of edges depends on a specified probability. The edge probability is chosen based on the number of nodes in the graph. The model most commonly used for this purpose was introduced by Gilbert (Gilbert, 1959). This is known as the G(n,p) model with n being the number of vertices and p the linking probability. The number of edges created according to this model is not known in advance. Erd?s-Rényi introduced a similar model where all the graphs with m edges are equally probable and m varies between 0 and n(n-1)/2 (Erd?s & Rényi, 1959). This is known as the G(n,m) model. The degree distribution for this network is Poissonian, see Figure 4.17 (right). Figure 4.17: Random graph and its Poissonian node degree distribution Very few real world networks are random. However, random networks are a theoretical construct that is well understood and their properties can be exactly solved. They are commonly used as a reference, e.g., in tests of network robustness and epidemic spreading (Batagelj & Brandes). 4.10.2 Watts-Strogatz Small WorldA small-world network is one whose majority of nodes are not directly connected to one another, but still can reach any other node via very few edges. It can be used to generate networks of any size. The degree distribution is almost Poissonian for any value of the rewiring probability (except in the extreme case of rewiring probability zero, for which all nodes have equal degree). The clustering coefficient is high until beta is close to 1, and as beta approaches one, the distribution becomes Poissonian. This is because the graph becomes increasingly similar to an Erd?s-Rényi Random Graph, see Figure 4.18. (Watts & Strogatz, 1998; Inc. Wikimedia Foundation, 2009). Figure 4.18: Small world graph (left) and its node degree distribution equation (right) Small world properties are usually studied to explore networks with tunable values for the average shortest path between pairs of nodes and a high clustering coefficient. Networks with small values for the average shortest path and large values for the clustering coefficient can be used to simulate social networks, unlike ER random graphs, which have small average shortest path lengths, but low clustering coefficients. 4.10.3 Barabási-Albert Scale Free ModelThe Barabási-Albert (BA) model is an algorithm which generates a scale-free network by incorporating growth and preferential attachment. Starting with an initial network of a few nodes, a new node is added at each time step. Older nodes with a higher degree have a higher probability of attracting edges from new nodes. The probability of attachment is given by The initial number of nodes in the network must be greater than two and each of these nodes must have at least one connection. The final structure of the network does not depend on the initial number of nodes in the network. The degree distribution of the generated network is a power law with a scaling coefficient of -3 (Barabási & Albert, 1999; Barabási & Albert, 2002). Figure 4.19 shows the network on the left and the probability distribution on a log-log scale on the right.
Figure 4.19: Scale free graph (left) and its node degree distribution (right) This is the simplest known algorithm to generate a scale-free network. It can be applied to model undirected networks such as the collaboration network among scientists, the movie actor network, and other social networks where the connections between the nodes are undirected. However, it cannot be used to generate a directed network. ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
Document generated by Confluence on May 31, 2011 15:16 |