This page last changed on Apr 17, 2011 by scott.
Data Preparation
Text Files
- Remove ISI Duplicate Records – Removes duplicate publications form ISI records based on ISI Unique ID attribute.
- Remove Rows with Multitudinous Fields – Removes rows having at least N entries within a given field.
---------------------------------------------
- Extract Directed Network – Creates a directed network by placing a directed edge between the values in a given column to the values of a different column.
- Extract Bipartite Network – Creates an unweighted bipartite network by placing a directed edge between the values in a given column to the values of a different column.
- Extract Paper Citation Network – Extracts an unweighted directed network from papers to their citations.
- Extract Author Paper Network – Extracts an unweighted directed network from authors to their papers.
---------------------------------------------
- Extract Co-Occurrence Network – Extracts a network from a delimited table.
- Extract Word Co-Occurrence Network – Creates a weighted network where each node is a word and edges connect words to each other. The strength of an edge represents how often two words occur in the same body of text together.
- Extract Co-Author Network – Extracts a weighted network with authors as nodes and edge weights as the number of times those authors co-wrote a paper.
- Extract Reference Co-Occurrence (Bibliographic Coupling) Network – Extracts a weighted network from a Paper Citation network, with papers as nodes and edge weights as the number of citations two papers share.
---------------------------------------------
- Extract Document Co-Citation Network – Extracts a weighted network from a Paper Citation network, with papers as nodes and edge weights as the number of times two papers are cited together.
---------------------------------------------
- Detect Duplicate Nodes – Cleans graph data by detecting and preparing to merge nodes that are likely to represent the same entity.
- Update Network by Merging Nodes – Creates a new network by running the algorithm with both the Merge Table from "Detect Duplicate Nodes" and the original network selected.
Preprocessing
General
- Extract Top N% Records – Returns the top N% rows of a table by selecting the percentage of rows to keep and column to sort by.
- Extract Top N Records – Returns the top N rows of a table by selecting the number of rows to keep and column to sort by.
- Aggregate Data – Summarizes the input table by column, allowing the aggregation of values such as "Cited Reference Count," "Number of Pages," "Publication Year," "Times Cited," as well as values represented by many other delimiters.
Temporal
Geospatial
Topical
Networks
Analysis
Temporal
- Burst Detection – Determines periods of increased activity in a table with dates/timestamps.
Geospatial
- Geocoder – Converts place names to latitudes and longitudes.
- Congressional District Geocoder - This algorithm converts the given 9-digits U.S. ZIP codes (ZIP+4 codes) into its congressional districts and geographical coordinates (latitude and longitude).
- Yahoo Geocoder - This algorithm converts place names and addresses into latitudes and longitudes (requires Yahoo! API Key)
Topical
- Burst Detection – Determines periods of increased activity in a table with dates/timestamps.
Networks
- Network Analysis Toolkit (NAT) – Calculates basic network statistics, such as number of nodes (and isolated nodes), node attributes, number of edges, presence of self loops and parallel edges, average degree ("total," "in," and "out"), strength of component connections, and overall density.
Unweighted & Undirected
- Node Degree – Calculates the amount of edges adjacent to a node, and then appends that value to each node.
- Degree Distribution – Builds a histogram of the degree values of all nodes.
---------------------------------------------
- K-Nearest Neighbor (Java) – Calculates the correlation between the degree of a node and that of its neighbors, and then appends that value to each node.
- Watts-Strogatz Clustering Coefficient – Calculates the degree to which nodes tend to cluster together, and then appends that value to each node.
- Watts Strogatz Clustering Coefficient over K – Correlates the clustering coefficient and the degree of the nodes of a network.
---------------------------------------------
- Diameter – Calculates the length of the longest shortest path between pairs of nodes in a network.
- Average Shortest Path – Calculates the average length of the shortest path between pairs of nodes in a network.
- Shortest Path Distribution – Builds a histogram of the lengths of shortest paths between pairs of nodes in a network.
- Node Betweenness Centrality – Appends a value to each node which correlates to the number of shortest paths that node resides on. The more shortest paths between node-pairs a certain node resides on, the higher its betweenness centrality.
---------------------------------------------
- Weak Component Clustering – Extracts the N largest weakly connected components of a network.
- Global Connected Components – Calculates the number of connected components or subgraphs with a path between each pair of nodes.
---------------------------------------------
- Extract K-Core – Extracts the kth K-Core from a graph. The kth K-Core is what remains of the graph after every node with fewer than k edges connected to it is removed from the graph recursively.
- Annotate K-Coreness-- Appends to each node the K-Core that node belongs to.
---------------------------------------------
- HITS – Computes authority and hub score for every node.
Weighted & Undirected
Unweighted & Directed
- Node Indegree – Appends the number of incoming edges to each node.
- Node Outdegree – Appends the number of outgoing edges to each node.
- Indegree Distribution – Builds a histogram of the values of the indegree of all nodes.
- Outdegree Distribution – Builds a histogram of the values of the outdegree of all nodes.
---------------------------------------------
- K-Nearest Neighbor – Calculates the correlation between the degree of a node and that of its neighbors, and then appends that value to each node.
- Single Node In-Out Degree Correlations – Calculates the correlations between indegree and outdegree of a node.
---------------------------------------------
- Dyad Reciprocity – The ratio of dyads with a reciprocated tie to dyads with any tie.
- Arc Reciprocity – The ratio of reciprocal edges to total edges.
- Adjacency Transitivity – The ratio of transitive triads to intransitive triads (triads missing one edge).
---------------------------------------------
- Weak Component Clustering – Extracts the N largest weakly connected components of a network.
- Strong Component Clustering – Extracts the N largest strongly connected components of a network.
---------------------------------------------
- Extract K-Core – Extracts the kth K-Core from a graph. The kth K-Core is what remains of the graph after every node with fewer than k edges connected to it is removed from the graph recursively.
- Annotate K-Coreness – Appends to each node the K-Core that node belongs to.
---------------------------------------------
- HITS – Computes authority and hub score for every node.
- PageRank – Ranks the importance of a node by how many other important nodes point to it.
Weighted & Directed
- HITS – Computes authority and hub score for every node.
- PageRank – Ranks the importance of a node by how many other important nodes point to it, taking into account edge weights.
Modeling
Networks
- Random Graph – Generates a graph with a fixed number of nodes connected randomly by undirected edges.
- Watts-Strogatz Small World – Generates a graph whose majority of nodes are not directly connected to one another, but are still connected to one another via relatively few edges.
- Barabási-Albert Scale-Free – Generates a scale-free network by incorporating growth and preferential attachment.
-------------------------------------
- TARL (Topics, Aging and Recursive Linking) – Incorporates "aging" to generate bipartite coevolving networks of authors and papers. Can also be applied to other datasets with different aging distribution.
Visualization
General
Temporal
Geospatial
- Geo Map (Circle Annotations) – Generates a map of the US or the world upon which circles of user-defined size and color are projected. Result is a PostScript file.
- Geo Map (Colored-Region Annotations) – Generates a map of the US or the world with regions colored based on a user-defined metric. Result is a PostScript file.
Networks
- GUESS – Interactive data analysis and visualization tool.
---------------------------------------------
- Radial Tree/Graph (prefuse alpha) – A single node is placed at the center and all others are laid around it in a tree structure.
- Radial Tree/Graph with Annotation (prefuse beta) – A single node is placed at the center and all others are laid around it in a tree structure, with labels.
- Tree View (prefuse beta) – Visualizes directory hierarchies in a tree structure. Warning: Does not work on Macs.
- Tree Map (prefuse beta) – Visualizes hierarchies using the Treemap algorithm. Warning: Does not work on Macs.
- Force Directed with Annotation (prefuse beta) – Sorts randomly placed nodes into a more aesthetically pleasing visual layout.
- Fruchterman-Reingold with Annotation (prefuse beta) – Visualization which lays out nodes based on some force between them.
---------------------------------------------
- DrL (VxOrd) – A force-directed graph layout toolbox focused on real-world large-scale graphs.
- Specified (prefuse beta) – Visualization tool for use with graphs having pre-specified node coordinates.
---------------------------------------------
- Circular Hierarchy – Generates a circular visualization of the output produced by a multi-level aggregation method such as Blondel Community Detection. Result is a Postscript file.
|