Overview

dbdataset is a data package containing dvobject R object. dvobjectcontains lists of different dataframes of the parsed DrugBank database. dvobject has been built using dbparser R package.

dvobject can be used for conveniently exploring and analyzing the contents of the DrugBank database. dvobject is also intended to assist in drug discovery endeavors that plan to make use of the DrugBank database.

Moreover; it also can be used to in Machine Learning in many sub-fields such as:

  • Natural Language Processing (NLP)
  • Web Scrapping
  • Visualization

Installation

Although dvobject is much smaller that the unparsed DrugBank database size, it still exceeds the limit set by CRAN. So, it will be hosted on Github only for now. Hence, it could be installed via the following command.

devtools::install_github("interstellar-Consultation-Services/dbdataset")

The dvobject will then be available after running the following command:

Then a dvobject called drugbank will be available to be used as regular R object

dvobject Structure

dvobject introduces a unified and compressed format of drugs data. It is an R list object that contains one or more of the following sub-lists:

names(dbdataset::drugbank)
#> [1] "drugs"      "salts"      "products"   "references" "cett"

The following is the definition for each sub-list:

drugs

A list of data.frames that contain drugs information (i.e. synonyms, classifications, …) and it is the only mandatory list

names(dbdataset::drugbank[["drugs"]])
#>  [1] "general_information"     "drug_classification"    
#>  [3] "synonyms"                "pharmacology"           
#>  [5] "international_brands"    "mixtures"               
#>  [7] "packagers"               "manufacturers"          
#>  [9] "prices"                  "categories"             
#> [11] "dosages"                 "atc_codes"              
#> [13] "patents"                 "drug_interactions"      
#> [15] "sequences"               "calculated_properties"  
#> [17] "experimental_properties" "external_identifiers"   
#> [19] "pathway"                 "reactions"              
#> [21] "snp_effects"             "snp_adverse_reactions"  
#> [23] "food_interactions"       "pdb_entries"            
#> [25] "ahfs_codes"              "affected_organisms"     
#> [27] "groups"                  "external_links"

drugs

A data.frame contains drugs salts information

head(dbdataset::drugbank[["salts"]], 5)
#>     db_salt_id                          name       unii  cas_number
#> 1 DBSALT000105            Leuprolide acetate 37JNS02E7V  74381-53-6
#> 2 DBSALT003182           Leuprolide mesylate 8E3C3C493W 944347-41-5
#> 3 DBSALT001439            Sermorelin acetate 00IBG87IQW 114466-38-5
#> 4 DBSALT000093             Goserelin acetate 6YUU2PV0U8 145781-92-6
#> 5 DBSALT001733 Insulin human zinc suspension                       
#>                      inchikey average_mass monoisotopic_mass drugbank_id
#> 1 YFDMUNOZURYOCP-XNHQSDQCSA-N     1269.473    1268.666591578     DB00007
#> 2 MBIDSOMXPLCOHS-XNHQSDQCSA-N      1305.52    1304.633577372     DB00007
#> 3                                     <NA>              <NA>     DB00010
#> 4 IKDXDQDKCZPQSZ-JHYYTBFNSA-N    1329.4624    1328.662568858     DB00014
#> 5                                     <NA>              <NA>     DB00030

products

A data.frame of commercially available drugs products in the world

head(dbdataset::drugbank[["products"]], 5)
#>       name               labeller ndc_id ndc_product_code   dpd_id
#> 1 Refludan                  Bayer               50419-150         
#> 2 Refludan                  Bayer                         02240996
#> 3 Refludan Celgene Europe Limited                                 
#> 4 Refludan Celgene Europe Limited                                 
#> 5 Refludan Celgene Europe Limited                                 
#>   ema_product_code   ema_ma_number started_marketing_on ended_marketing_on
#> 1                                            1998-03-06         2013-06-30
#> 2                                            2000-01-31         2013-07-26
#> 3  EMEA/H/C/000122 EU/1/97/035/001           2016-09-08         2012-07-27
#> 4  EMEA/H/C/000122 EU/1/97/035/002           2016-09-08         2012-07-27
#> 5  EMEA/H/C/000122 EU/1/97/035/003           2016-09-08         2012-07-27
#>                        dosage_form     strength       route
#> 1                           Powder    50 mg/1mL Intravenous
#> 2             Powder, for solution 50 mg / vial Intravenous
#> 3 Injection, solution, concentrate        50 mg Intravenous
#> 4 Injection, solution, concentrate        50 mg Intravenous
#> 5 Injection, solution, concentrate        20 mg Intravenous
#>   fda_application_number generic over_the_counter approved country  source
#> 1              NDA020807   false            false     true      US FDA NDC
#> 2                          false            false     true  Canada     DPD
#> 3                          false            false    false      EU     EMA
#> 4                          false            false    false      EU     EMA
#> 5                          false            false    false      EU     EMA
#>   drugbank_id
#> 1     DB00001
#> 2     DB00001
#> 3     DB00001
#> 4     DB00001
#> 5     DB00001

references

A list of data.frames of articles, links and textbooks about drugs or CETT data

names(dbdataset::drugbank[["references"]])
#> [1] "drugs"        "carriers"     "enzymes"      "targets"      "transporters"

cett

A list of data.frames contain targets, enzymes, carriers and transporters information

names(dbdataset::drugbank[["cett"]])
#> [1] "carriers"     "enzymes"      "targets"      "transporters"

Package Version

The package version will always be the same as the DrugBank database used.