We start our analysis with those orphan diseases that have at least one implicated gene mutation. Our data set is thus limited to 2010 ODs and 2121 orphan disease genes (ODG). A gene and OD are considered connected if a known mutation in that gene is implicated as a causal mutation for the OD. We downloaded this information from Orphanet and the OMIM databases, using the Uniprot Knowledgebase interface. From the OD-gene bi-partite network and using the protein interactions from the human interactome, we first built and analyzed 3 types of networks:
- Orphan Disease Network (ODN)
- Orphan Disease Gene Network (ODGN)
- Orphan Disease Gene Interactome (ODGI)
In the second stage, we selected a subset of all ODs with 4 or more causal genes and connected them based on enriched and shared features (e.g., biological processes, cellular components, pathways, or literature citations). In this network, two ODs are connected (by a shared feature) even if they do not share a gene. Functional enrichment analyses were performed using the ToppGene and ToppCluster servers. Finally, using the cited literature in the orphan disease records, we constructed a document-based OD network to analyze and compare it with gene-based OD network.
All the networks and related analysis in the current study are created using Gephi. Users can interactively query the different networks for genes or ODs of interest. Since each of the graphs contains around 1000 nodes, it is hard to use an interactive Web application to explore it. We therefore used the Seadragon export plug-in from Gephi which we think is an ideal way to draw the graph on a bitmap and explore it visually. Since each of the global networks is disconnected, sub-networks of the weakly connected components (only around 100 nodes, each node representing one connected component) is displayed with an interactive Flash Web application, GexfWalker which allows the user to explore the graph visualize each node, its neighbors, and its attributes.