The visual analytics tool Intergraph is in active development and offers a novel approach to the exploration of large graphs by means of an iterative search and discovery workflow.
Intergraph was first conceived as a technical demonstrator in the ANR/FNR-funded research project BLIZAAR – Hybrid Visualization of Dynamic Multilayer Graphs. The current demonstrator was built around a network dataset based of 13.000 documents which describe the process of European integration. This dataset was originally created during the histograph project. This dataset treats automatically detected named entities (persons, locations and institutions) as nodes and co-occurrences between them in documents as edges. The current demonstrator can be accessed and tested here: https://blizaar-lab.recherche.cergy.eisti.fr/
With this call we are looking for scholars who have datasets they wish to explore by means of network visualisations. Limited resources to help with the generation of such data and its integration in Intergraph are available.
If you are interested, please contact Stefan Bornhofen (sb[ät]eisti.fr) and send a brief description of your (network) dataset and a set of research questions you would like to address with the help of Intergraph. No particular tech skills are required.
Below we provide descriptions of Intergraph functionalities and the technical requirements for a use case.
Corresponding grant references: ANR grant BLIZAAR ANR-15-CE23-0002-01 and FNR grant BLIZAAR INTER/ANR/14/9909176.
Intergraph at a glance
Main Features:
- Create multimodal subgraphs from a large graph by means of search & suggestions for relevant nodes
- Handle data imperfections such as node duplicates and wrong node type attribution
- Visualise and compare multiple subgraphs in 2D/3D
- Observe changes over time and filter for node types
- Link nodes and edges to underlying references (e.g. texts)
What separates Intergraph from other graph visualisation tools such as Gephi/Visone/Netdraw etc.:
- Focus on subgraph creation and comparison of graphs, exploration of imperfect data
- In active development; will be adjusted to your specific needs (e.g. add node centrality or clustering analyses)
- Maintains links data visualisation and underlying resources
- (Limited) resources available to generate network data from existing resources (details below).
Visualisations of large graphs can quickly become overwhelmingly complex and hard to make sense of. Intergraph therefore encourages users to create and inspect subnetworks with entities relevant to their current research interest.
The main idea of Intergraph is to begin the exploration of a large dataset with a keyword search for a number of start nodes.
Following the expand-on-demand principle, the user will encounter and inspect new relevant nodes in the resulting initial graph and pursue their exploration by conveniently creating additional graphs stemming from the existing ones. This path of exploration yields a sequence of linked canvases such as those shown in the screenshot below.
Graphs can be dynamically added to and deleted from the scene. They are rendered on free-floating planes which can be arbitrarily translated, oriented and scaled. Depending on the user tasks and preferences, the scene can be viewed in a 2D or a 3D perspective. Two-dimensional views are known to be very efficient for visual data exploration. The third dimension allows the user to stack planar graph layers in space and notably to create communicative 2.5D visualizations.
Node colors reflect the node type, and sizes indicate the number of underlying resources. A click on a node or edge gives immediate access to these resources.
New graphs are typically produced by querying 1.5D ego-networks of existing nodes, via easy-to-communicate operations such as
- All entities co-appearing with a given entity
- All collections mentioning a given entity
- All entities mentioned in a given collection
- All collections sharing resources with a given collection
If the same node appears on two or more graphs of the scene, interedges are drawn (see Screenshot). This allows users to create their own interest-driven search and discovery paths across the dataset.
Results to new queries first appear in the form of a list in the left pane. This first kind of visualization, itemizing only the nodes without the edges, may in some cases already be sufficient to work with. The list view lets users decide whether it is worth generating the graph or to recompile the list of nodes (e.g. in case of missing nodes or nodes which should be excluded from the graph using a so-called cull list). A graph of a given node list, or a part of it, can be generated on demand, and it is added to the scene on the right side of the interface.
Graphs can be submitted to a filter which operates on resource type and time. Subgraphs of a given resource type can provide a better understanding of its distribution within the corpus. Subgraphs considering the resources of a specific time window allow assessing the relevance and interconnections of entities during a considered period. The user can also shift the time window and get an animated representation of the dynamic graph.
If time-to-time mapping, i.e. animation, is not convenient to analyze the evolution of a network over time, time-to-space mapping is also possible. For this purpose the user can clone and “freeze” a graph of the scene, meaning that its current filter is fixed. By this means, several graphs with the same nodes but distinct time periods can be juxtaposed (2D) or superimposed (3D) in space.
Technical description
Architecture
Intergraph is written in javascript and runs in a web browser. The client communicates with a node.js server and queries graph data from a Neo4j database. After retrieving the queried data in a json-formatted file, the graphs are rendered in a 3D scene using the Three.js graphics library.
An instance of the Intergraph server is currently installed at EISTI and available at https://blizaar-lab.recherche.cergy.eisti.fr/
Demo Dataset
The histograph dataset models a number of “entities” occurring in a collection of “resources”.
Resources are multimedia documents which are time-stamped by their historic publication date. For each resource, metadata about its provenance is available. The resources differ significantly in nature: among them are newspaper articles, diplomatic notes, personal memoirs, audio interview transcripts, cartoons and photos with descriptive captions. Resources are part of one or more collections, from the highest logical unit of thematic corpora (ePublications) down to units and subunits. This affiliation is modeled in the graph by the “is_part_of” relationship.
Entities (persons, locations, organizations, themes) have been extracted from the resources using Named-Entity Recognition software such as YAGO and TextRazor. This process allowed generating the “appears_in” relationship in the graph.
Based on the two fundamental relationships “appears_in” and “is_part_of”, other relationship have been derived: Entities “co-appear” in resources, and collections “share” resources, by bipartite network projection. Finally, a collection “mentions” an entity, and accordingly the entity “is_mentioned_in” the collection, if the collection contains at least one resource where the entity appears.
New datasets
It is recommended that new datasets are provided in a human-readable text file format. It is then possible to write a script which allows importing the data into a Neo4j database.
New datasets can be easily integrated into and visualized by the Intergraph platform if the data structure corresponds, i.e. ideally we have
- Entities (of one or more types) appearing in
- an (organized or unorganized) collection of
- (timestamped) (multimedia) resources (with further meta-data).
If you are unsure whether your data matches these criteria, do not hesitate to get in touch with Stefan Bornhofen, Email: sb[ät]eisti.fr !