Biograph

Drug Repositioning Application

By Sandeep Shantharam

Introduction

Biograph is an application being built for Drug-Repositioning:

  • Frontend Interface for MURI team "curation".
  • Backend application with ElasticSearch for searching "entities"
  • Backend application with Graph Database (Neo4j) for mapping the "relationship"
  • Data analytics platform for performing analysis on the datastore of Biograph.
  • Backend Application is a REST API endpoints and the Frontend Applicaiton will be an Angular Framework.

Problem - Data IntegrationĀ 

  • Entities in the application meaning - "gene", "protein", "chemical", "pathway" and "diseases".
  • The problem with most of the biological databases, is the lack of quality or quantity of these entities.
  • The Biograph application plans to solve this by integrating these databases.
  • In the Data integration, the inclusion and exclusion of the entities will result in quality of the final datastore.

Solution - ElasticSearch

  • ElasticSearch is a database that can be used to unify the entities and further streamline our process in eliminating the duplication of entities.
  • The ElasticSearch is important for curation purposes.
  • ElasticSearch will increase the rate at which the curation is done and also eliminate duplication in relationships.
  • The backend could be further used for text-analysis and automatic curation in the future.

Schema - ElasticSearch


{"index": {"_index": "ctdbio", "_type": "string"}} 
## ["chemical","gene","protein","pathway","disease"]
        
{
  "bid": bid,       ## Primary Unique Number that will represent the type
  "type": "string", ## ["chemical","gene","protein","pathway","disease"]
  "name": name,
  "primary": {dictionary},
  "secondary": synonymns, ## List of Synonymns
  "tag": [list],              ## Tag for the entity
  "text": "string"        ## Text that is relevant to the Entity
}
Schema - Chemical
sc_line = {
   "bid": bid,
   "type": "chemical",
   "name": name,
   "primary": {"drugbank": drugbank, "cas": cas},
   "secondary": synonymns,
   "tag": [],
   "text": chemtext
   }
Schema - Gene
sc_line = {
    "bid": bid,
    "type": "gene",
    "name": name,
    "primary": {"biogrid": biogrid, "pharmagkb": pharmagkb, 
    "uniprotid": uniprotid, "gene_symbol": gen_sym, "alt_gene_ids": alt_gene},
    "secondary": synonymns,
    "tag": [],
    "text":  ""       
  }
Schema - Disease
sc_line = {
   "bid": bid,
   "type": "disease",
   "name": name,
   "primary": {"alt_dis": alt_dis, "medic_terms": medic},
   "secondary": synonymns,
   "tag": [],
   "text": distext
 }
Schema - Pathway
sc_line = {
   "bid": bid,
   "type": "pathway",
   "name": name,
   "primary": {},
   "secondary": [],
   "tag": [],
   "text": ""
}

Problem - Graph Datastore

  • Entities in the graph datastore are the sameĀ - "gene", "protein", "chemical", "pathway" and "diseases".
  • The biological databases just try to map a single relationship between entities.
  • The Biograph application plans to solve this by able to map relation across different entities with path information.
  • In building the datastore, the quality of the database will depend on the expressiveness of the queries and the usefulness of the query to the biologists.

Solution - Neo4j

  • Neo4j is a database that will be used to store the data in a network format or graph format.
  • The Neo4j will be important as it can do some of the data analytic's like node degree and path identification easy for the application.
  • Neo4j will help in storing the relationship created by the curators.

Schema - Ne04j

{  
  "table":{  
    "_response":{  
      "columns":[  
        "r"
      ],
      "data":[  
        {  
          "row":[  
            {  
              "name":"DNA segment, 03B03F (Research Genetics)",
              "bid":"27777"
            }
          ],
          "graph":{  
            "nodes":[  
              {  
                "id":"155926",
                "labels":[  
                  "gene"
                ],
                "properties":{  
                  "name":"DNA segment, 03B03F (Research Genetics)",
                  "bid":"27777"
                }
              }
            ],
            "relationships":[

            ]
          }
        },
        {  
          "row":[  
            {  
              "name":"DNA segment, 03B03R (Research Genetics)",
              "bid":"27778"
            }
          ],
          "graph":{  
            "nodes":[  
              {  
                "id":"155927",
                "labels":[  
                  "gene"
                ],
                "properties":{  
                  "name":"DNA segment, 03B03R (Research Genetics)",
                  "bid":"27778"
                }
              }
            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            {  
              "name":"DNA segment, 03.MMHAP34FRA.seq",
              "bid":"53288"
            }
          ],
          "graph":{  
            "nodes":[  
              {  
                "id":"155928",
                "labels":[  
                  "gene"
                ],
                "properties":{  
                  "name":"DNA segment, 03.MMHAP34FRA.seq",
                  "bid":"53288"
                }
              }
            ],
            "relationships":[  

            ]
          }
        }
      ],
      "stats":{  
        "contains_updates":false,
        "nodes_created":0,
        "nodes_deleted":0,
        "properties_set":0,
        "relationships_created":0,
        "relationship_deleted":0,
        "labels_added":0,
        "labels_removed":0,
        "indexes_added":0,
        "indexes_removed":0,
        "constraints_added":0,
        "constraints_removed":0
      }
    },
    "nodes":[  
      {  
        "id":"155926",
        "labels":[  
          "gene"
        ],
        "properties":{  
          "name":"DNA segment, 03B03F (Research Genetics)",
          "bid":"27777"
        }
      },
      {  
        "id":"155927",
        "labels":[  
          "gene"
        ],
        "properties":{  
          "name":"DNA segment, 03B03R (Research Genetics)",
          "bid":"27778"
        }
      },
      {  
        "id":"155928",
        "labels":[  
          "gene"
        ],
        "properties":{  
          "name":"DNA segment, 03.MMHAP34FRA.seq",
          "bid":"53288"
        }
      }
    ],
    "other":[  

    ],
    "relationships":[  

    ],
    "size":3,
    "stats":{  
      "contains_updates":false,
      "nodes_created":0,
      "nodes_deleted":0,
      "properties_set":0,
      "relationships_created":0,
      "relationship_deleted":0,
      "labels_added":0,
      "labels_removed":0,
      "indexes_added":0,
      "indexes_removed":0,
      "constraints_added":0,
      "constraints_removed":0
    }
  },
  "graph":{  
    "nodeMap":{  
      "155926":{  
        "name":"DNA segment, 03B03F (Research Genetics)",
        "bid":"27777"
      },
      "155927":{  
        "name":"DNA segment, 03B03R (Research Genetics)",
        "bid":"27778"
      },
      "155928":{  
        "name":"DNA segment, 03.MMHAP34FRA.seq",
        "bid":"53288"
      }
    },
    "relationshipMap":{  

    }
  }
}

Schema - labels

{  
  "table":{  
    "_response":{  
      "columns":[  
        "labels(n)",
        "type(r)"
      ],
      "data":[  
        {  
          "row":[  
            [  
              "disease"
            ],
            "chemical_disease"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "gene"
            ],
            "chemical_gene"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "pathway"
            ],
            "chemical_pathway"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "chemical"
            ],
            "chemical_gene"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "disease"
            ],
            "gene_disease"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "pathway"
            ],
            "gene_pathway"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "chemical"
            ],
            "chemical_disease"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "pathway"
            ],
            "disease_pathway"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "gene"
            ],
            "gene_disease"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "chemical"
            ],
            "chemical_pathway"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "disease"
            ],
            "disease_pathway"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        },
        {  
          "row":[  
            [  
              "gene"
            ],
            "gene_pathway"
          ],
          "graph":{  
            "nodes":[  

            ],
            "relationships":[  

            ]
          }
        }
      ],
      "stats":{  
        "contains_updates":false,
        "nodes_created":0,
        "nodes_deleted":0,
        "properties_set":0,
        "relationships_created":0,
        "relationship_deleted":0,
        "labels_added":0,
        "labels_removed":0,
        "indexes_added":0,
        "indexes_removed":0,
        "constraints_added":0,
        "constraints_removed":0
      }
    },
    "nodes":[  

    ],
    "other":[  

    ],
    "relationships":[  

    ],
    "size":12,
    "stats":{  
      "contains_updates":false,
      "nodes_created":0,
      "nodes_deleted":0,
      "properties_set":0,
      "relationships_created":0,
      "relationship_deleted":0,
      "labels_added":0,
      "labels_removed":0,
      "indexes_added":0,
      "indexes_removed":0,
      "constraints_added":0,
      "constraints_removed":0
    }
  },
  "graph":{  
    "nodeMap":{  

    },
    "relationshipMap":{  

    }
  }
}

Challenges

  • Data Integration - need database parsing in the format the schema entails.
  • Graph Datastore - the quality of the relation and the scoring the relationship edges.
  • Frontend - the intuitiveness of the interface will enable the curators to contribute towards building the database.
  • Quality evaluation - Quality parameters for the curators.

Thank you.. Any Questions ???