Awesome Computational Biology
A knowledge collection of databases, software and papers related to computational biology.
Computational biology involves the development and application of data-analytical and theoretical methods,
mathematical modelling and computational simulation techniques to the study of biological, ecological,
behavioural, and social systems. - Wikipedia
Contents
Databases
scRNA
-
Gene Expression Omnibus - Public functional genemics database.
-
Single Cell PORTAL - Public database for single cell RNA.
-
Single Cell Expression Atlas - Public database for single cell RNA.
Compound
-
PubChem - One of the biggest chemical database such as compounds, genes and proteins.
-
ChEBI - Chemical database focused on small chemical compounds.
-
ChEMBL - Database of bioactive molecules with drug-like properties.
-
ChemSpider - Chemical structure database.
-
KEGG COMPOUND - Collection of small molecules and biopolymers.
-
LIPID MAPS - Database of lipids.
-
Rhea - Database of chemical reactions.
-
Drug Repurposing Hub - Collections of drug repurposing data containing drug, moa, target etc.
Pathway
-
PathwayCommons - Database of Pathways and Interactions.
-
KEGG PATHWAY - Collection fo drawn pathway maps.
-
WikiPathways - Database of biological pathways.
Mass Spectra
-
MassBank - Open souce databases and tools for mass spectrometry reference spectra.
-
MoNA MassBank of North America - Meta database of metabolite mass spectra, metadata and associated compounds.
Protein
-
THE HUMAN PROTEIN ATLAS - One of the biggest human protein database contained cells, tissues, and organs.
-
PROTEIN DATA BANK - Database of the 3D shapes of proteins, nucleic acids, and complex assemblies.
-
UniProt - The collection of functional information on proteins.
-
AlphaFold Protein Structure Database - Database of 3D protein structures.
Genome
-
Human Genome Resources at NCBI - Database of image, proteomics, transcriptomics and systems biology.
-
GenBank - Database of genetic sequence offered by NCBI.
-
UCSC Genome Browser - Genome blowser offered by UCSC.
-
cBioPortal - Database of Cancer Genomics. This has overall metaview for a lot of patients.
Disease
-
KEGG DRUG - Comprehensive drug information resource for approved drugs.
-
DrugBank - A database of drug and target maintained by the University of Alberta.
Interaction
- Drug Gene Interaction
- Drug (-Cell line) Response
- Chemical Protein Interaction
-
STITCH - A database of Chemical Protein Interaction.
-
BindingDB - A database of compounds and targes.
- Protein-Protein Interaction
-
STRING - Protein-Protein Interaction Networks for several organisms.
-
BioGRID - Database of Protein, Genetic and Chemical Interactions.
-
HIPPIE - Human Protein-Protein Interaction database.
- Knowledge Graph
API
Preprocess
-
Chemistry Development Kit - A software of cheminformatics and Machine Learning.
-
RDKit - A software of cheminformatics and Machine Learning.
-
Scanpy - scRNA analysis library in Python.
-
Seurat - scRNA analysis library in R.
Machine Learning Tasks and Models
Drug Response Prediction
-
drGAT : A model for drug response prediction with gene explainability with attention mechanism.
-
Drug Repurposing
-
DeepPurpose - A DL Library for Drug Repurposing and so on.
-
DRKG - A library for biological knowledge graph.
Drug Target Interaction
-
NeoDTI - A library for Drug Target Interaction.
Compound Protein Interaction
-
MCPINN - A library for drug discovery using Compound Protein Interaction and Machine Learning.
-
TransformerCPI - A library for Compound Protein Interaction prediction using Transformer.
Pre-trained embedding
LLM for biology
-
AI4Chem/ChemLLM-7B-Chat - LLM for chemical and molecule science
-
BioGPT - LLM for Biomedical text generation
-
GeneGPT - LLM for biomedical information with several API.
-
GenePT - foundation LLM for single cell data
-