Sci2 Manual : 4.9 Network Analysis (With Whom?)
This page last changed on Apr 01, 2011 by dapolley.
The study of networks aims to increase our understanding of natural and man-made networks. It builds on social network analysis, physics, information science, bibliometrics, scientometrics, econometrics, informetrics, webometrics, communication theory, sociology of science, and several other disciplines.
Diverse algorithms exist to calculate specific node, edge, and network properties. Node properties comprise degree centrality, betweenness centrality, or hub and authority scores. Edge properties include durability, reciprocity, intensity (weak or strong), density (how many potential edges in a network actually exist), reachability (how many steps it takes to go from one "end" of a network to another), centrality (whether a network has a "center" point), quality (reliability or certainty), and strength. Network properties refer to the number of nodes and edges, network density, average path length, clustering coefficient, and distributions from which general properties such as small-world, scale-free, or hierarchical can be derived. Identifying major communities via community detection algorithms and calculating the "backbone" of a network via pathfinder network scaling or maximum flow algorithms helps to communicate and make sense of large scale networks. 4.9.1 Network Extraction4.9.1.1 Direct Linkages4.9.1.1.1 Document-Document (Citation) NetworkPapers cite other papers via references forming an unweighted, directed paper citation graph. It is beneficial to indicate the direction of information flow, in order of publication, via arrows. Such a representation allows for the tracking of citation networks chronologically, yielding a better understanding of the influence of previous research on subsequent research, which more clearly describes the scholarly relationship between individual publications. Citations to a paper support the forward traversal of the graph. Citing and being cited can be seen as roles a paper possesses (Nicolaisen, 2007). 4.9.1.1.2 Author-Author (Citation) NetworkAuthors cite other authors via document references forming a weighted, directed author citation graph. Like document-document networks, author citation networks represent the flow of information over time. Unlike document citations, however, these networks have weighted edges representing the volume of citations from one author to the next. 4.9.1.1.3 Source-Source (Citation) NetworkFor larger scale studies, it is often useful to explore citation patterns between entire journals and other varieties of publications. These networks can reveal both the relative importance of certain publications, and the underlying connections between disciplines. These networks are directed and weighted by volume of citations between journals. 4.9.1.1.4 Author-Paper (Consumed/Produced) NetworkThere are active and passive units of science. Active units, e.g., authors, produce and consume passive units, e.g., papers, patents, datasets, software. The resulting networks have multiple types of nodes, e.g., authors and papers. Directed edges indicate the flow of resources from sources to sinks, e.g., from an author to a written/produced paper to the author who reads/consumes the paper. 4.9.1.2 Co-Occurrence Linkages4.9.1.2.1 Author Co-Occurrence (Co-Author) NetworkHaving the names of two authors (or their institutions, countries) listed on one paper, patent, or grant is an empirical manifestation of scholarly collaboration. The more often two authors collaborate, the higher the weight of their joint co-author link. Weighted, undirected co-authorship networks appear to have a high correlation with social networks that are themselves impacted by geospatial proximity (Börner, Penumarthy, Meiss, & Ke, 2006; Wellman, White, & Nazer, 2004). 4.9.1.2.2 Document Cited Reference Co-Occurrence (Bibliographic Coupling) NetworkPapers, patents or other scholarly records that share common references are said to be coupled bibliographically (Kessler, 1963),
The bibliographic coupling (BC) strength of two scholarly papers can be calculated by counting the number of times that they reference the same third work in their bibliographies. The coupling strength is assumed to reflect topic similarity. Co-occurrence networks are undirected and weighted. 4.9.1.2.3 Author Cited Reference Co-Occurrence (Bibliographic Coupling) NetworkAuthors who cite the same sources are coupled bibliographically. The bibliographic coupling (BC) strength between two authors can be said to be a measure of similarity between them. The resulting network is weighted and undirected. 4.9.1.2.4 Journal Cited Reference Co-Occurrence (Bibliographic Coupling) NetworkLike document and author bibliographic coupling networks, journal cited reference co-occurrences provide a measurement of similarity between journals. Edge strength between two journals is determined by summing the number of unique references both journals cite. 4.9.1.3 Co-Citation LinkagesTwo scholarly records are said to be co-cited if they jointly appear in the list of references of a third paper. The more often two units are co-cited the higher their presumed similarity. 4.9.1.3.1 Document Co-Citation Network (DCA)DCA was simultaneously and independently introduced by Small and Marshakova in 1973 (Marshakova, 1973; Small, 1973; Small & Greenlee, 1986). It is the logical opposite of bibliographic coupling. The co-citation frequency equals the number of times two papers are cited together, i.e., they appear together in one reference list. 4.9.1.3.2 Author Co-Citation Network (ACA)Authors of works that repeatedly appear together in lists of references are assumed to be related. Clusters in ACA networks often reveal shared schools of thought or methodology, common subjects of study, collaborative and student-mentor relationships, ties of nationality, etc. Some regions of scholarship are densely crowded and interactive. Others are isolated or nearly vacant. 4.9.1.3.3 Journal Co-Citation Network (JCA)JCA networks offer wide-angle views of scholarly disciplines. Slicing these networks by time can reveal the evolution of disciplinary similarity. Like author and document co-citation networks, these are undirected and weighted. 4.9.2 Compute Basic Network CharacteristicsIt is often advantageous to know for a network
In the Sci2 Tool, use 'Analysis > Network Analysis Toolkit (NAT)' to get basic properties, e.g., for the network of This graph claims to be undirected.
4.9.3 Network AnalysisIn the analysis menu, certain algorithms append values to each node, or delete groups of nodes and edges entirely:
4.9.4 Network Visualization4.9.4.1 GUESS VisualizationsLoad the sample dataset 'yoursci2directory/sampledata/socialscience/florentine.nwb' and calculate an additional node attribute 'Betweenness Centrality' by running 'Analysis > Networks > Unweighted and Undirected > Node Betweenness Centrality' with default parameters. Then select the network and run 'Visualization > Networks > GUESS' to open GUESS with the file loaded. It might take some time for the network to load. The initial layout will be random. Wait until the random layout is completed and the network is centered before proceeding.
4.9.4.1.1 Network Layout and InteractionGUESS provides different network layout algorithms under menu item 'Layout'. Apply 'Layout > GEM' to the Florentine network. Use 'Layout > Bin Pack' to compact and center the network layout. These layout algorithms often employ some degree of randomness, and layouts may look different every time they are used. Also note that running GEM and/or Bin Pack on the same network multiple times will continue to change the visualization each and every time they are used.
Use the Graph Modifier to change node attributes, e.g.,
4.9.4.1.2 InterpreterUse Jython, a version of Python that runs on the Java Virtual Machine, to write code in the interpreter. Here we list some GUESS commands which can be used to modify the layout. Size code nodes Label The result is shown in Figure 4.16. Read https://nwb.cns.iu.edu/community/?n=VisualizeData.GUESS for more information on how to use the interpreter.
4.9.4.2 DrL Large Network LayoutDrL is a force-directed graph layout toolbox for real-world large-scale graphs up to 2 million nodes (Davidson, Wylie, & Boyack, 2001; Martin, Brown, & Boyack,unpublished). It includes:
This is one of the few force-directed layout algorithms that can scale to over 1 million nodes, making it ideal for large graphs. However, the algorithm doesn't always render small graphs ( less than a hundred records) well. The algorithm expects similarity matrices as input. Distance and other networks will have to be converted before they can be laid out. For article/citation networks, feed the network into either cocitation or bibliographic coupling for computing similarity. Use this network for laying out DrL. ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
Document generated by Confluence on May 31, 2011 15:16 |