Module 02: Prediction of ZWHQD Compound Targets
1 Overview
This module systematically identifies potential therapeutic targets of blood-absorbed bioactive ingredients via integrative target prediction. Prediction data were retrieved from seven authoritative online platforms, including three classic mainstream databases and four additional professional target fishing servers: BATMAN-TCM 2.01, SuperPred 3.02, SwissTargetPrediction3, PharmMapper4,5, TargetNetyao2016TargetNet?, PPB36, and SEA7.
1.1 Database Introduction
Seven authoritative target prediction web servers were selected, covering different prediction principles (machine learning, pharmacophore mapping, similarity search, etc.) to ensure the diversity and accuracy of prediction results. Detailed information of each database is as follows:
1.1.1 BATMAN-TCM 2.0 (http://bionet.ncpsb.org.cn/batman-tcm/)
An updated web server dedicated to network pharmacology-based prediction and analysis of traditional Chinese medicine (TCM)1. It integrates multiple algorithms to predict potential targets of TCM compounds, with a focus on the compatibility and synergistic effects of TCM ingredients, and provides comprehensive target-related functional annotations, which is suitable for target prediction of blood-absorbed TCM compounds.
1.1.2 SuperPred 3.0 (https://prediction.charite.de/)
A web server for predicting Anatomical Therapeutic Chemical (ATC) codes and potential targets of small molecules2. Its target prediction is based on a linear logistic regression model, trained on Morgan fingerprints (length 2048) of 1552 different drugs in 233 level 4 ATC classes. It can rank ATC classes and target candidates by scoring, providing reliable reference for compound classification and target identification.
1.1.3 SwissTargetPrediction (https://www.swisstargetprediction.ch/)
A widely used target prediction tool developed by the Swiss Institute of Bioinformatics3. It predicts potential targets of small molecules by analyzing the similarity between query compounds and known active molecules, supports multiple species (Homo sapiens, Mus musculus, Rattus norvegicus, etc.), and maintains consistent underlying technologies and parameters after interface updates, ensuring the stability and reproducibility of prediction results.
1.1.4 PharmMapper (http://www.lilab-ecust.cn/pharmmapper/)
A web server for potential drug target identification using the pharmacophore mapping approach4,5. It constructs a comprehensive target pharmacophore database, matches the pharmacophore of query compounds with the pharmacophores of known targets, and predicts potential targets by calculating the matching degree, which is particularly suitable for the prediction of small molecule targets with specific spatial structures.
1.1.5 TargetNet (https://targetnet.scbdd.com/)
A web service for predicting potential drug-target interaction profiling via multi-target structure-activity relationship (SAR) modelsyao2016TargetNet?. It integrates multiple machine learning algorithms and SAR models to predict targets of small molecules, and provides detailed interaction scores and target functional annotations, which can effectively improve the accuracy of target prediction.
1.1.6 PPB3 (https://ppb3.genome-mining.com/)
A web-based deep learning tool for target prediction using ChEMBL data6. It adopts deep learning algorithms to train on a large number of compound-target interaction data in ChEMBL, and can predict potential targets of small molecules with high accuracy, especially suitable for polypharmacology research of compounds.
1.1.7 SEA (https://sea.bkslab.org/)
A target prediction tool based on ligand chemistry similarity7. It infers potential targets of query compounds by analyzing the similarity between the compound and known ligands of target proteins, and establishes the relationship between protein structure and function through ligand information, with high prediction efficiency and wide coverage of target types.
1.2 Target Screening Rules
To ensure the reliability and specificity of the predicted targets, strict screening rules were formulated based on the characteristics of each database, and the following steps were implemented sequentially: ### Raw Data Import Raw target prediction data were imported uniformly from the seven above-mentioned web servers, including compound information, target protein names, prediction scores, confidence levels, and other related parameters. For BATMAN-TCM 2.0, due to temporary web page parsing failure, prediction data were collected after the server was restored or by alternative reliable channels.
1.2.1 Confidence Score Filtering (Database-Specific Thresholds)
According to the scoring system of each database, low-confidence predictions were filtered to retain only high-confidence target candidates, with the following specific thresholds:
BATMAN-TCM 2.0: Retain targets with a prediction score ≥ 0.8 (default high-confidence threshold of the server, corresponding to a false positive rate < 5%).
SuperPred: Retain targets with a prediction score ≥ 0.7 (the score corresponds to the probability of the compound interacting with the target, as recommended by the server’s FAQ).
SwissTargetPrediction: Retain targets with a “Probability” score ≥ 0.5 (the score reflects the similarity between the query compound and known ligands, and targets with scores ≥ 0.5 have reliable interaction potential).
PharmMapper: Retain targets with a “Fit Score” ≥ 0.8 (the score reflects the matching degree between the compound’s pharmacophore and the target’s pharmacophore, with scores ≥ 0.8 indicating good matching).
SEA: Retain targets with a “Score” ≥ 20 (the score is based on ligand similarity, and targets with scores ≥ 20 have significant interaction potential, as recommended by the original literature).
TargetNet: Retain targets with a “Prediction Score” ≥ 0.6 (the score is calculated by multi-target SAR models, with scores ≥ 0.6 indicating high confidence).
PPB3: Retain targets with a “Confidence Score” ≥ 0.7 (the deep learning-based score, with scores ≥ 0.7 corresponding to high prediction reliability).
1.2.2 Gene Symbol Standardization
All retained candidate target proteins were mapped and standardized to official human gene symbols using the UniProt knowledgebase (https://www.uniprot.org/). For targets with non-standard names or aliases, the corresponding official gene symbols were confirmed by searching the UniProt database, and targets that could not be standardized (no corresponding official gene symbols) were excluded.
1.3 Merge & Deduplication
Target data from different databases were merged, and redundant targets (the same official gene symbol corresponding to multiple prediction results) were removed. For the same target predicted by multiple databases, the highest prediction score among all databases was retained as the final confidence score of the target, to enhance the reliability of the target.
1.3.1 Final Target Confirmation
After the above steps, the final high-confidence compound-target interaction pairs were sorted and consolidated, ensuring that each pair has clear prediction confidence and standardized gene symbols, and excluding pairs with ambiguous target information or low confidence.
1.4 Workflows Summary
Import raw target prediction data from seven authoritative web servers.
Filter low-confidence targets according to database-specific scoring thresholds.
Standardize target gene symbols using the UniProt knowledgebase.
Merge cross-database target data and remove redundancy.
Confirm and output final high-quality compound-target pairs.
1.5 Main Outputs
Standardized high-quality compound-target interaction dataset, detailed prediction score annotation tables, and supplementary target screening statistics tables.
2 Load Packages
3 Step 1: Process BATMAN-TCM Targets
3.1 Load Data
3.2 Known Targets
Code
compound_target_known_batman <- compounds_final %>%
left_join(batman_target_known, by = "CID", relationship = "many-to-many") %>%
select(-ends_with(".x")) %>%
rename_with(~ str_replace(., "\\.y$", ""), ends_with(".y")) %>%
distinct(Herb.Name.Pinyin, CID, Uniprot.ID, .keep_all = TRUE) %>%
filter(!is.na(Symbol))3.3 Predicted Targets (Score ≥ 0.84)
Code
compound_target_pred_batman <- compounds_final %>%
left_join(batman_target_pred, by = "CID", relationship = "many-to-many") %>%
select(-ends_with(".x")) %>%
rename_with(~ str_replace(., "\\.y$", ""), ends_with(".y")) %>%
distinct(Herb.Name.Pinyin, CID, Uniprot.ID, .keep_all = TRUE) %>%
filter(!is.na(Symbol), Score >= 0.84)3.4 Save
4 Step 2: Process Super-Pred Targets
Code
rm(list = ls())
source("../../scripts/utils.R")
load("../../data/processed/tcm/06_compounds_final.RData")
load("../../data/raw/uniprot/uniprot_human_reviewed.RData")
# Known targets
known <- read_compound_target("../../data/raw/tcm/super/known") %>%
select(CID, Protein.name = `Target Name`, Uniprot.ID = `UniProt ID`)
# Predicted targets (Prob > 70, Model Accuracy > 90)
pred <- read_compound_target("../../data/raw/tcm/super") %>%
filter(Probability > 70, `Model accuracy` > 90) %>%
select(CID, Protein.name = `Target Name`, Uniprot.ID = `UniProt ID`)4.1 Merge & Annotate
Code
process_super <- function(df) {
df %>%
left_join(uniprot_human_reviewed, by = c("Uniprot.ID" = "Entry")) %>%
filter(!is.na(Gene.Names.primary)) %>%
mutate(Symbol = Gene.Names.primary) %>%
distinct(CID, Uniprot.ID, .keep_all = TRUE)
}
known_clean <- process_super(known)
pred_clean <- process_super(pred)
# Merge with compounds
compound_target_known_super <- compounds_final %>%
left_join(known_clean, by = "CID", relationship = "many-to-many") %>%
distinct(Herb.Name.Pinyin, CID, Uniprot.ID, .keep_all = TRUE) %>%
filter(!is.na(Symbol))
compound_target_pred_super <- compounds_final %>%
left_join(pred_clean, by = "CID", relationship = "many-to-many") %>%
distinct(Herb.Name.Pinyin, CID, Uniprot.ID, .keep_all = TRUE) %>%
filter(!is.na(Symbol))4.2 Save
Code
save(
compound_target_known_super,
compound_target_pred_super,
file = "../../data/processed/tcm/08_super_targets.RData"
)
# Table S4: super targets
write.xlsx(
list(Known = compound_target_known_super, Predicted = compound_target_pred_super),
"../../tables/supplementary/Table_S4_super_targets.xlsx"
)5 Step 3: Process SwissTargetPrediction
Code
rm(list = ls())
source("../../scripts/utils.R")
load("../../data/processed/tcm/06_compounds_final.RData")
load("../../data/raw/uniprot/uniprot_human_reviewed.RData")
result <- read_compound_target("../../data/raw/tcm/swiss") %>%
filter(`Probability*` > 0.10) %>%
separate_rows(`Common name`, `Uniprot ID`, sep = " ") %>%
mutate(
CID = clean_cids(CID),
Uniprot.ID = gsub("[[:space:]]", "", `Uniprot ID`),
Symbol = gsub("[[:space:]]", "", `Common name`)
) %>%
select(CID, Uniprot.ID) %>%
distinct(CID, Uniprot.ID, .keep_all = TRUE) %>%
left_join(uniprot_human_reviewed, by = c("Uniprot.ID" = "Entry")) %>%
filter(!is.na(Gene.Names.primary))
compound_target_pred_swiss <- compounds_final %>%
left_join(result, by = "CID", relationship = "many-to-many") %>%
filter(!is.na(Gene.Names.primary)) %>%
dplyr::rename(Symbol = Gene.Names.primary) %>%
distinct(Herb.Name.Pinyin, CID, Symbol, .keep_all = TRUE)5.1 Save
6 Step 4: Process PharmMapper
Code
rm(list = ls())
source("../../scripts/utils.R")
load("../../data/processed/tcm/06_compounds_final.RData")
load("../../data/raw/uniprot/uniprot_human_reviewed.RData")
result <- read_compound_target2("../../data/raw/tcm/pharm",skip_row = 1) %>%
dplyr::filter(grepl("_HUMAN", Uniplot),zscore>0, `Norm Fit`>0.9) %>%
separate_rows(`Common name`, `Uniprot ID`, sep = " ") %>%
mutate(
CID = clean_cids(CID),
Uniprot.ID = gsub("_HUMAN", "", Uniplot)) %>%
select(CID, Uniprot.ID) %>%
distinct(CID, Uniprot.ID, .keep_all = TRUE) %>%
left_join(uniprot_human_reviewed, by = c("Uniprot.ID" = "Entry")) %>%
filter(!is.na(Gene.Names.primary))
compound_target_pred_pharm <- compounds_final %>%
left_join(result, by = "CID", relationship = "many-to-many") %>%
filter(!is.na(Gene.Names.primary)) %>%
dplyr::rename(Symbol = Gene.Names.primary) %>%
distinct(Herb.Name.Pinyin, CID, Symbol, .keep_all = TRUE)6.1 Save
7 Step 5: Process targetnet
Code
rm(list = ls())
source("../../scripts/utils.R")
load("../../data/processed/tcm/06_compounds_final.RData")
load("../../data/raw/uniprot/uniprot_human_reviewed.RData")
query <- read.xlsx("../../data/raw/tcm/targetnet/query.xlsx")
cids<- query$CID
col_names <- c("Uniprot.ID","Protein",cids)
result <- read.xlsx("../../data/raw/tcm/targetnet/results.xlsx") %>%
dplyr::select(-Details) %>%
setNames(col_names) %>%
pivot_longer(
cols = -c(Uniprot.ID, Protein),
names_to = "CID",
values_to = "Probability"
) %>%
dplyr::filter(!is.na(Probability), Probability > 0.5) %>%
mutate(CID = as.numeric(CID)) %>%
arrange(CID,Probability) %>%
mutate(CID = as.character(CID)) %>%
dplyr::select(CID,Uniprot.ID,Probability) %>%
distinct(CID, Uniprot.ID, .keep_all = TRUE) %>%
left_join(uniprot_human_reviewed, by = c("Uniprot.ID" = "Entry")) %>%
filter(!is.na(Gene.Names.primary))
compound_target_pred_targetnet <- compounds_final %>%
left_join(result, by = "CID", relationship = "many-to-many") %>%
filter(!is.na(Gene.Names.primary)) %>%
dplyr::rename(Symbol = Gene.Names.primary) %>%
distinct(Herb.Name.Pinyin, CID, Symbol, .keep_all = TRUE)8 Step 6: Process PPB3
Code
rm(list = ls())
source("../../scripts/utils.R")
load("../../data/processed/tcm/06_compounds_final.RData")
load("../../data/raw/uniprot/uniprot_human_reviewed.RData")
result <- read_compound_target("../../data/raw/tcm/ppb3") %>%
dplyr::filter( `Target Type` == 'SINGLE PROTEIN', `Target Organism` == "Homo sapiens")9 Step 7: Process SEA
10 Step 8: Merge All Targets & Deduplicate
10.1 Common Columns
10.2 Known Targets
Code
load("../../data/processed/tcm/07_batman_targets.RData")
load("../../data/processed/tcm/08_super_targets.RData")
compound_target_known <- rbind(
compound_target_known_batman[, cnames],
compound_target_known_super[, cnames]
) %>%
distinct(Herb.Name.Pinyin, CID, Symbol, .keep_all = TRUE) %>%
filter(!is.na(Symbol))10.3 Predicted Targets
10.4 All Targets
11 Step 9: Final Standardization with UniProt
Code
load("../../data/raw/uniprot/uniprot_human_reviewed.RData")
compound_target_final <- compound_target_all %>%
left_join(uniprot_human_reviewed, by = c("Uniprot.ID" = "Entry")) %>%
select(-Reviewed, -Entry.Name)
compound_target_export <- compound_target_final %>%
select(-Sequence)
save(
compound_target_final,
file = "../../data/processed/tcm/12_compound_target_final.RData"
)11.1 Export Final Supplementary Table
12 Results
12.1 Target Statistics
Code
load("../../data/processed/tcm/07_batman_targets.RData")
load("../../data/processed/tcm/08_super_targets.RData")
load("../../data/processed/tcm/09_swiss_targets.RData")
load("../../data/processed/tcm/11_all_compound_targets.RData")
cat("===== BATMAN-TCM =====\n")
cat("Known compounds:", n_distinct(compound_target_known_batman$CID), "\n")
cat("Known targets:", n_distinct(compound_target_known_batman$Uniprot.ID), "\n\n")
cat("===== Super-Pred =====\n")
cat("Known compounds:", n_distinct(compound_target_known_super$CID), "\n")
cat("Known targets:", n_distinct(compound_target_known_super$Uniprot.ID), "\n\n")
cat("===== SwissTargetPrediction =====\n")
cat("Pred compounds:", n_distinct(compound_target_pred_swiss$CID), "\n")
cat("Pred targets:", n_distinct(compound_target_pred_swiss$Uniprot.ID), "\n\n")
cat("===== FINAL SUMMARY =====\n")
cat("Total compounds:", n_distinct(compound_target_all$CID), "\n")
cat("Total targets:", n_distinct(compound_target_all$Symbol), "\n")| Home | About | Methods | Results | Previous Module | Next Module |