The First Affiliated Hospital of Xi’an Jiaotong University
1 Modular Analysis Workflow & Technical Scheme
1.1 Part 1 TCM Component and Target Mining
1.1.1 Module 1 ZWHQD Active Component Screening
1.1.1.1 Function
Based on the DCABM-TCM database, blood-absorbed prototype and metabolic compounds of six medicinal herbs in Zhenwu Huangqi Decoction were collected. Strict filtering rules, manual supplementation of key metabolites, and batch retrieval of chemical descriptors from PubChem were performed to generate a standardized dataset of candidate active components.
1.1.1.2 Screening Criteria
Compounds without valid PubChem CID were removed.
Highly toxic diester-type diterpenoid alkaloids from Aconiti Radix Lateralis Preparata were excluded.
Endogenous nutrients including sugars, amino acids, vitamins, and common organic acids were discarded.
The key astragalus metabolite cycloastragenol was manually supplemented.
Duplicates and ambiguous annotations were trimmed; only categories with documented heart failure-related activity were retained.
1.1.1.3 Input
Input Content
Data Name
Description
Raw herbal compound table
zwhqd_compound_dcabm.xlsx
Original component data downloaded from DCABM-TCM
Manual revision table
compounds_add_manu.xlsx
Artificial supplementation and correction of components
Compound property filtering table
compounds_properties_manu.xlsx
Toxicity and category screening annotation table
1.1.1.4 Core Procedures
Import raw TCM component data from DCABM-TCM.
Text cleaning, compound separation, and extraction of PubChem CID, compound name, and molecular formula.
Manual addition of cycloastragenol.
Manual correction of CID and nomenclature; removal of entries missing CID and duplicate records.
Filtering by toxicity classification and compound category to eliminate unqualified ingredients.
Parallel batch query via PubChem to obtain IUPAC name, molecular weight, SMILES, InChI, and InChIKey.
Data integration, deduplication, statistical summary of ingredient numbers per herb, shared compound analysis, and export of supplementary tables.
1.1.1.5 Output
Output Content
Data Name
Description
Standardized active component dataset
compounds_final.RData / compounds_final.xlsx
Filtered and annotated ZWHQD active components
Supplementary tables
Table S1, Table S2
Component classification and basic information statistics
Shared compound statistics
compound_sharing_stats
Statistics of common ingredients across herbs
1.1.2 Module 2 Component Target Prediction
1.1.2.1 Function
Potential targets of qualified active components were predicted using three authoritative databases. Predictions were filtered by confidence thresholds, unified with UniProt annotation, merged and deduplicated to construct high-quality compound–target pairs.
1.1.2.2 Prediction Rules
BATMAN-TCM - Retain known validated targets - Predicted targets with score ≥ 0.84
Super-Pred - Retain known experimental targets - Predicted targets with probability > 70% and model accuracy > 90%
SwissTargetPrediction - Predicted targets with probability > 0.1
All target symbols were standardized to official HGNC nomenclature via UniProt human reviewed database. Known and predicted targets were merged and deduplicated.
1.1.2.3 Input
Input Content
Data Name
Description
Active component dataset
compounds_final.Rdata
Output screened components from Module 1
Raw target datasets
BATMAN / Super-Pred / SwissTargetPrediction raw data
Downloaded target prediction results
Human protein annotation library
Uniprot human reviewed database
Standardize gene symbol and UniProt ID
1.1.2.4 Output
Output Content
Data Name
Description
Known component targets
compound_target_known.Rdata
Experimentally validated compound-target pairs
Predicted component targets
compound_target_pred.Rdata
High-confidence predicted targets
Integrated target set
compound_target_final.Rdata
Merged and deduplicated compound-target dataset
Supplementary tables
Table S3–S6
Target statistics and annotation tables
1.2 Part 2 GEO Data Preprocessing and Disease Target Mining
Public transcriptomic datasets were downloaded, probe-annotated, integrated, and batch-corrected. Two standardized expression matrices were generated to adapt to downstream different analytical requirements.
1.2.1.2 Input
Input Content
Data Name
Description
GEO transcriptome datasets
GSE141910, GSE17755, GSE1810
HFrEF related expression profiling data
Platform annotation files
GPL16043, GPL13497, GPL1219
Probe-to-gene annotation reference
1.2.1.3 Core Procedures
Probe sets were mapped to Gene Symbol; multiple probes per gene were summarized by maximum expression value.
Cross-platform and cross-batch merging followed by batch correction using sva::ComBat.
Two normalized matrices were exported:
Matrix A: log2-transformed for DEG, WGCNA, functional enrichment
Matrix B: Linear non-log2 matrix dedicated to CIBERSORT immune infiltration
1.2.1.4 Output
Output Content
Data Name
Description
Batch-corrected expression matrices
log2 matrix & linear matrix
Dual normalized matrices for different downstream analyses
Sample grouping metadata
sample_group_info.csv
Group label of HFrEF and control samples
Gene probe mapping table
probe_gene_mapping.csv
Annotation correspondence between probe and gene symbol
Estimate relative abundance of 22 immune cell subtypes, compare intergroup differences, and analyze gene–immune correlation.
1.4.2.2 Input
Input Content
Data Name
Description
Non-log2 expression matrix
linear_expression_matrix
Unlogged batch-corrected data for immune deconvolution
Sample grouping table
sample_group_info.csv
Group label for difference comparison
1.4.2.3 Rules
Only non-log2 matrix is allowed; log-transformed data are prohibited
Reference signature: LM22
Retain samples with deconvolution P < 0.05
1.4.2.4 Output
Output Content
Data Name
Description
Immune cell abundance matrix
immune_cell_abundance.csv
Relative proportion of 22 immune cell subtypes
Intergroup difference plots
immune_boxplot.png
Immune cell difference between groups
Correlation heatmap
gene_immune_heatmap.png
Correlation between hub genes and immune cells
1.4.3 Module 11 Single-Cell RNA-seq Analysis
1.4.3.1 Function
Perform single-cell quality control, dimensionality reduction, clustering, cell type annotation, and evaluate hub gene set activity across cell subtypes.
1.4.3.2 Input
Input Content
Data Name
Description
Single-cell count matrix
GSE1810_raw_count
Original single-cell expression matrix
1.4.3.3 Parameters
Filter low-quality cells by mitochondrial proportion; no global ComBat
Normalization: LogNormalize; Top 2000 highly variable genes
PCA top 15 components; resolution = 1.5; UMAP clustering