Methods & Technical Scheme

Authors

Affiliations

Kun Hou

Health Science Center, Xi’an Jiaotong University

Hanzhong Traditional Chinese Medicine Hospital

Supervisor’s name

Health Science Center, Xi’an Jiaotong University

The First Affiliated Hospital of Xi’an Jiaotong University

1 Modular Analysis Workflow & Technical Scheme

1.1 Part 1 TCM Component and Target Mining

1.1.1 Module 1 ZWHQD Active Component Screening

1.1.1.1 Function

Based on the DCABM-TCM database, blood-absorbed prototype and metabolic compounds of six medicinal herbs in Zhenwu Huangqi Decoction were collected. Strict filtering rules, manual supplementation of key metabolites, and batch retrieval of chemical descriptors from PubChem were performed to generate a standardized dataset of candidate active components.

1.1.1.2 Screening Criteria

Compounds without valid PubChem CID were removed.
Highly toxic diester-type diterpenoid alkaloids from Aconiti Radix Lateralis Preparata were excluded.
Endogenous nutrients including sugars, amino acids, vitamins, and common organic acids were discarded.
The key astragalus metabolite cycloastragenol was manually supplemented.
Duplicates and ambiguous annotations were trimmed; only categories with documented heart failure-related activity were retained.

1.1.1.3 Input

Input Content	Data Name	Description
Raw herbal compound table	zwhqd_compound_dcabm.xlsx	Original component data downloaded from DCABM-TCM
Manual revision table	compounds_add_manu.xlsx	Artificial supplementation and correction of components
Compound property filtering table	compounds_properties_manu.xlsx	Toxicity and category screening annotation table

1.1.1.4 Core Procedures

Import raw TCM component data from DCABM-TCM.
Text cleaning, compound separation, and extraction of PubChem CID, compound name, and molecular formula.
Manual addition of cycloastragenol.
Manual correction of CID and nomenclature; removal of entries missing CID and duplicate records.
Filtering by toxicity classification and compound category to eliminate unqualified ingredients.
Parallel batch query via PubChem to obtain IUPAC name, molecular weight, SMILES, InChI, and InChIKey.
Data integration, deduplication, statistical summary of ingredient numbers per herb, shared compound analysis, and export of supplementary tables.

1.1.1.5 Output

Output Content	Data Name	Description
Standardized active component dataset	compounds_final.RData / compounds_final.xlsx	Filtered and annotated ZWHQD active components
Supplementary tables	Table S1, Table S2	Component classification and basic information statistics
Shared compound statistics	compound_sharing_stats	Statistics of common ingredients across herbs

1.1.2 Module 2 Component Target Prediction

1.1.2.1 Function

Potential targets of qualified active components were predicted using three authoritative databases. Predictions were filtered by confidence thresholds, unified with UniProt annotation, merged and deduplicated to construct high-quality compound–target pairs.

1.1.2.2 Prediction Rules

BATMAN-TCM - Retain known validated targets - Predicted targets with score ≥ 0.84

Super-Pred - Retain known experimental targets - Predicted targets with probability > 70% and model accuracy > 90%

SwissTargetPrediction - Predicted targets with probability > 0.1

All target symbols were standardized to official HGNC nomenclature via UniProt human reviewed database. Known and predicted targets were merged and deduplicated.

1.1.2.3 Input

Input Content	Data Name	Description
Active component dataset	compounds_final.Rdata	Output screened components from Module 1
Raw target datasets	BATMAN / Super-Pred / SwissTargetPrediction raw data	Downloaded target prediction results
Human protein annotation library	Uniprot human reviewed database	Standardize gene symbol and UniProt ID

1.1.2.4 Output

Output Content	Data Name	Description
Known component targets	compound_target_known.Rdata	Experimentally validated compound-target pairs
Predicted component targets	compound_target_pred.Rdata	High-confidence predicted targets
Integrated target set	compound_target_final.Rdata	Merged and deduplicated compound-target dataset
Supplementary tables	Table S3–S6	Target statistics and annotation tables

1.2 Part 2 GEO Data Preprocessing and Disease Target Mining

1.2.1 Module 3 GEO Dataset Download, Merging and Dual Matrix Preprocessing

1.2.1.1 Function

Public transcriptomic datasets were downloaded, probe-annotated, integrated, and batch-corrected. Two standardized expression matrices were generated to adapt to downstream different analytical requirements.

1.2.1.2 Input

Input Content	Data Name	Description
GEO transcriptome datasets	GSE141910, GSE17755, GSE1810	HFrEF related expression profiling data
Platform annotation files	GPL16043, GPL13497, GPL1219	Probe-to-gene annotation reference

1.2.1.3 Core Procedures

Probe sets were mapped to Gene Symbol; multiple probes per gene were summarized by maximum expression value.
Cross-platform and cross-batch merging followed by batch correction using sva::ComBat.
Two normalized matrices were exported:
- Matrix A: log2-transformed for DEG, WGCNA, functional enrichment
- Matrix B: Linear non-log2 matrix dedicated to CIBERSORT immune infiltration

1.2.1.4 Output

Output Content	Data Name	Description
Batch-corrected expression matrices	log2 matrix & linear matrix	Dual normalized matrices for different downstream analyses
Sample grouping metadata	sample_group_info.csv	Group label of HFrEF and control samples
Gene probe mapping table	probe_gene_mapping.csv	Annotation correspondence between probe and gene symbol

1.2.2 Module 4 Differential Expression Gene (DEG) Screening

1.2.2.1 Function

Differential expression genes between HFrEF and healthy controls were identified and visualized.

1.2.2.2 Input

Input Content	Data Name	Description
Normalized expression matrix	log2 expression matrix	Batch-corrected log2 transformed data from Module 3
Sample grouping information	sample_group_info.csv	Group label for differential comparison

1.2.2.3 Parameters

Tool: limma
Loose threshold: P < 0.05, |log2FC| ≥ 0.585
Stringent threshold: |log2FC| > 1
Visualization: Volcano plot

1.2.2.4 Output

Output Content	Data Name	Description
DEG gene list	deg_up / deg_down / deg_all	Up-regulated, down-regulated and total differential genes
DEG visualization plot	deg_volcano.png	Volcano plot of differential expression

1.2.3 Module 5 WGCNA Weighted Gene Co-expression Network Analysis

1.2.3.1 Function

Construct scale-free co-expression networks, partition gene modules, and screen phenotype-related hub genes associated with HFrEF.

1.2.3.2 Input

Input Content	Data Name	Description
Normalized expression matrix	log2 expression matrix	Batch-corrected log2 expression data
Clinical phenotype data	clinical_trait_info.csv	HFrEF phenotypic characteristic data

1.2.3.3 Core Procedures

Quality control using goodSamplesGenes() to remove outlier samples and low-expression genes.
Optimal soft threshold determined by pickSoftThreshold().
Module-trait correlation calculated via Spearman and Mantel test.
Hub genes filtered by Module Membership (MM ≥ 0.6) and Gene Significance (GS ≥ 0.1).
Key disease-related modules: MEbrown, MEroyalblue.

1.2.3.4 Output

Output Content	Data Name	Description
Gene module classification	wgcna_module_gene_list	Gene distribution in each co-expression module
WGCNA hub gene set	wgcna_hub_genes.txt	Phenotype-related core genes

1.2.4 Module 6 HFrEF Disease Target Integration

1.2.4.1 Function

Integrate DEGs and WGCNA hub genes to obtain candidate disease targets for HFrEF.

1.2.4.2 Input

Input Content	Data Name	Description
Differential gene set	deg_all	Full DEG list from Module 4
Co-expression hub genes	wgcna_hub_genes.txt	Core genes from WGCNA Module 5

1.2.4.3 Core Logic

Intersection of DEGs and WGCNA hub genes, retaining genes with |log2FC| > 1 to ensure statistical reliability.

1.2.4.4 Output

Output Content	Data Name	Description
Integrated disease target set	hfref_disease_targets.txt	Candidate HFrEF pathogenic genes

1.3 Part 3 Network Construction and Hub Gene Identification

1.3.1 Module 7 Compound-Target and PPI Network Construction

1.3.1.1 Function

Construct herb–component–target network and protein–protein interaction (PPI) network of overlapping drug-disease targets.

1.3.1.2 Input

Input Content	Data Name	Description
ZWHQD component targets	compound_target_final.Rdata	Merged compound-target genes from Module 2
HFrEF candidate targets	hfref_disease_targets.txt	Integrated disease genes from Module 6

1.3.1.3 Parameters

Network visualization: Cytoscape 3.10.3
PPI network: STRING database, Homo sapiens, confidence > 0.700

1.3.1.4 Output

Output Content	Data Name	Description
Herb-component-target network files	ctn_node.csv / ctn_edge.csv	Node and edge information of regulatory network
PPI network files	ppi_node.csv / ppi_edge.csv	Protein-protein interaction network data
PPI interaction matrix	ppi_adjacency_matrix.csv	Gene interaction adjacency matrix

1.3.2 Module 8 Hub Gene Screening via MCODE and Cytohubba

1.3.2.1 Function

Mine core sub-networks from PPI network and identify ZWHQD–HFrEF hub genes.

1.3.2.2 Input

Input Content	Data Name	Description
PPI network file	ppi_edge.csv	Interaction network from Module 7

1.3.2.3 Parameters

MCODE: degree cutoff = 2, node score cutoff = 0.2, k-core = 2
Cytohubba: 4 local + 6 global topological algorithms
Integration: MCODE core genes + Cytohubba top genes + DEG/WGCNA intersection genes

1.3.2.4 Output

Output Content	Data Name	Description
Final hub gene set	zwhqd_hfref_hub_genes.txt	117 core genes for subsequent analysis

1.4 Part 4 Functional and Mechanism Analysis

1.4.1 Module 9 GO/KEGG Functional Enrichment Analysis

1.4.1.1 Function

Annotate biological functions and core signaling pathways of hub genes.

1.4.1.2 Input

Input Content	Data Name	Description
Hub gene list	zwhqd_hfref_hub_genes.txt	Core genes screened from Module 8

1.4.1.3 Parameters

ID conversion: org.Hs.eg.db
Tool: clusterProfiler
Threshold: P < 0.001
Dimension: BP, CC, MF, KEGG pathway

1.4.1.4 Output

Output Content	Data Name	Description
Enrichment result tables	go_anno.csv / kegg_anno.csv	GO and KEGG annotation statistics
Enrichment plots	bubble_plot / bar_plot	Visualization of functional enrichment

1.4.2 Module 10 CIBERSORT Immune Infiltration Analysis

1.4.2.1 Function

Estimate relative abundance of 22 immune cell subtypes, compare intergroup differences, and analyze gene–immune correlation.

1.4.2.2 Input

Input Content	Data Name	Description
Non-log2 expression matrix	linear_expression_matrix	Unlogged batch-corrected data for immune deconvolution
Sample grouping table	sample_group_info.csv	Group label for difference comparison

1.4.2.3 Rules

Only non-log2 matrix is allowed; log-transformed data are prohibited
Reference signature: LM22
Retain samples with deconvolution P < 0.05

1.4.2.4 Output

Output Content	Data Name	Description
Immune cell abundance matrix	immune_cell_abundance.csv	Relative proportion of 22 immune cell subtypes
Intergroup difference plots	immune_boxplot.png	Immune cell difference between groups
Correlation heatmap	gene_immune_heatmap.png	Correlation between hub genes and immune cells

1.4.3 Module 11 Single-Cell RNA-seq Analysis

1.4.3.1 Function

Perform single-cell quality control, dimensionality reduction, clustering, cell type annotation, and evaluate hub gene set activity across cell subtypes.

1.4.3.2 Input

Input Content	Data Name	Description
Single-cell count matrix	GSE1810_raw_count	Original single-cell expression matrix

1.4.3.3 Parameters

Filter low-quality cells by mitochondrial proportion; no global ComBat
Normalization: LogNormalize; Top 2000 highly variable genes
PCA top 15 components; resolution = 1.5; UMAP clustering
Annotation: SingleR automatic annotation + CellMarker manual correction

1.4.3.4 Output

Output Content	Data Name	Description
Cell annotation result	cell_type_annotation.csv	Cell subtype classification and labeling
Gene set activity matrix	aucell_activity_score.csv	Functional activity score of hub gene set
Single-cell visualization	umap_cluster_plot.png	UMAP clustering and cell distribution plot

1.5 Part 5 Causal Inference, Diagnostic Modeling and Mechanism Validation

1.5.1 Module 12 Mendelian Randomization Analysis

1.5.1.1 Function

Infer causal relationship between hub genes and HFrEF using eQTL and GWAS summary data.

1.5.1.2 Input

Input Content	Data Name	Description
Core hub gene list	zwhqd_hfref_hub_genes.txt	Candidate causal genes
eQTL dataset	eqtl_summary.csv	Expression quantitative trait loci data
GWAS summary data	ebi-a-GCST90018910	HFrEF genome-wide association summary

1.5.1.3 Parameters

SNP threshold: P < 5e-8; LD clumping: r² < 0.001, window = 10000 kb
Methods: IVW, Wald Ratio, MR-Egger, Weighted Median, MR-PRESSO
Threshold: P < 0.05

1.5.1.4 Output

Output Content	Data Name	Description
MR analysis result table	mr_analysis_result.csv	Causal effect estimation and statistics
Core causal gene list	causal_core_genes.txt	Core genes with confirmed causal association

1.5.2 Module 13 Colocalization and Causal Direction Verification

1.5.2.1 Function

Validate MR reliability and confirm causal orientation of candidate genes.

1.5.2.2 Input

Input Content	Data Name	Description
MR instrumental variables	snp_iv_list.csv	Significant SNP IVs from MR analysis
Omics summary data	eqtl & gwas summary	Matched eQTL and GWAS summary dataset

1.5.2.3 Methods

Bayesian colocalization via coloc package, shared causal variant judged by PP.H4
Steiger test and bidirectional MR to avoid reverse causation and pleiotropy

1.5.2.4 Output

Output Content	Data Name	Description
Colocalization statistical results	coloc_result.csv	Posterior probability of shared causal locus
Final confirmed core genes	verified_core_genes.txt	Validated HFrEF pathogenic genes

1.5.3 Module 14 Diagnostic Model Construction

1.5.3.1 Function

Build and validate HFrEF diagnostic prediction model based on core causal genes.

1.5.3.2 Input

Input Content	Data Name	Description
Core gene expression matrix	core_gene_expression.csv	Expression profile of verified core genes
Sample grouping metadata	sample_group_info.csv	HFrEF and control group labels

1.5.3.3 Workflow

Univariate logistic screening → multivariate logistic regression → nomogram construction → ROC, calibration and DCA evaluation.

1.5.3.4 Output

Output Content	Data Name	Description
Diagnostic nomogram	model_nomogram.png	Visual predictive model diagram
Model evaluation results	roc / calibration / dca curve	AUC, calibration efficiency and clinical net benefit

1.5.4 Module 15 Single-Gene GSEA Analysis

1.5.4.1 Function

Reveal downstream biological pathways and molecular mechanisms mediated by the core gene CASP8.

1.5.4.2 Input

Input Content	Data Name	Description
Normalized transcriptome matrix	log2 expression matrix	Batch-corrected log2 expression data
Core representative gene	CASP8	Key validated pathogenic gene

1.5.4.3 Core Procedures

Group samples according to high and low expression level of CASP8.
Identify differential genes between two groups using limma.
Perform gene set enrichment analysis via clusterProfiler to explore enriched signaling pathways.

1.5.4.4 Output

Output Content	Data Name	Description
GSEA enrichment table	gsea_result.csv	Enriched pathway annotation and statistics
GSEA visualization plot	gsea_enrichment_plot.png	Pathway enrichment landscape diagram

1.5.5 Module 16 Molecular Docking Analysis

1.5.5.1 Function

Evaluate binding affinity and molecular interaction between ZWHQD active components and core target protein CASP8.

1.5.5.2 Input

Input Content	Data Name	Description
Representative active components	Five core ZWHQD ingredients	Bioactive small molecules from Module 1
Receptor protein structure	PDB: 4ZBW	Crystal structure of human CASP8

1.5.5.3 Core Procedures

Preprocess protein and ligand using Schrödinger Maestro.
Predict active binding pocket and perform molecular docking.
Evaluate binding energy and visualize 3D binding conformation via PyMOL.

1.5.5.4 Output

Output Content	Data Name	Description
Docking scoring table	docking_score.csv	Binding affinity ranking of each component
3D binding conformation plot	docking_complex_plot.png	Visual molecular interaction diagram

1.6 Data Specification and Usage Rules

Module	Batch Correction Required	Log2 Transformation	Data Type
1–2	No	No	Database and manual curated compound/target data
3	Yes	Output both	Raw GEO and platform annotation data
4	Yes	Yes	log2 normalized expression matrix
5	Yes	Yes	log2 normalized expression matrix
6–8	Yes	Not required	Gene lists and network files
9	Yes	Yes	Hub gene list
10	Yes	Prohibited	Linear non-log2 expression matrix
11	No global batch	Internal log normalization	Single-cell raw count matrix
12–16	Yes	Yes (except docking)	Expression matrix and core gene sets

Home | About | Results

Start Analysis

--- title: "Methods & Technical Scheme" format: html: theme: cosmo toc: true toc-depth: 3 number-sections: true smooth-scroll: true page-layout: full fig-dpi: 300 fig-align: center execute: eval: false echo: true warning: false message: false --- # Modular Analysis Workflow & Technical Scheme ## Part 1 TCM Component and Target Mining ### Module 1 ZWHQD Active Component Screening #### Function Based on the DCABM-TCM database, blood-absorbed prototype and metabolic compounds of six medicinal herbs in Zhenwu Huangqi Decoction were collected. Strict filtering rules, manual supplementation of key metabolites, and batch retrieval of chemical descriptors from PubChem were performed to generate a standardized dataset of candidate active components. #### Screening Criteria - Compounds without valid PubChem CID were removed. - Highly toxic diester-type diterpenoid alkaloids from *Aconiti Radix Lateralis Preparata* were excluded. - Endogenous nutrients including sugars, amino acids, vitamins, and common organic acids were discarded. - The key astragalus metabolite **cycloastragenol** was manually supplemented. - Duplicates and ambiguous annotations were trimmed; only categories with documented heart failure-related activity were retained. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Raw herbal compound table | zwhqd_compound_dcabm.xlsx | Original component data downloaded from DCABM-TCM | | Manual revision table | compounds_add_manu.xlsx | Artificial supplementation and correction of components | | Compound property filtering table | compounds_properties_manu.xlsx | Toxicity and category screening annotation table | #### Core Procedures 1. Import raw TCM component data from DCABM-TCM. 2. Text cleaning, compound separation, and extraction of PubChem CID, compound name, and molecular formula. 3. Manual addition of cycloastragenol. 4. Manual correction of CID and nomenclature; removal of entries missing CID and duplicate records. 5. Filtering by toxicity classification and compound category to eliminate unqualified ingredients. 6. Parallel batch query via PubChem to obtain IUPAC name, molecular weight, SMILES, InChI, and InChIKey. 7. Data integration, deduplication, statistical summary of ingredient numbers per herb, shared compound analysis, and export of supplementary tables. #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Standardized active component dataset | compounds_final.RData / compounds_final.xlsx | Filtered and annotated ZWHQD active components | | Supplementary tables | Table S1, Table S2 | Component classification and basic information statistics | | Shared compound statistics | compound_sharing_stats | Statistics of common ingredients across herbs | --- ### Module 2 Component Target Prediction #### Function Potential targets of qualified active components were predicted using three authoritative databases. Predictions were filtered by confidence thresholds, unified with UniProt annotation, merged and deduplicated to construct high-quality compound–target pairs. #### Prediction Rules **BATMAN-TCM** - Retain known validated targets - Predicted targets with score ≥ 0.84 **Super-Pred** - Retain known experimental targets - Predicted targets with probability > 70% and model accuracy > 90% **SwissTargetPrediction** - Predicted targets with probability > 0.1 All target symbols were standardized to official HGNC nomenclature via UniProt human reviewed database. Known and predicted targets were merged and deduplicated. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Active component dataset | compounds_final.Rdata | Output screened components from Module 1 | | Raw target datasets | BATMAN / Super-Pred / SwissTargetPrediction raw data | Downloaded target prediction results | | Human protein annotation library | Uniprot human reviewed database | Standardize gene symbol and UniProt ID | #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Known component targets | compound_target_known.Rdata | Experimentally validated compound-target pairs | | Predicted component targets | compound_target_pred.Rdata | High-confidence predicted targets | | Integrated target set | compound_target_final.Rdata | Merged and deduplicated compound-target dataset | | Supplementary tables | Table S3–S6 | Target statistics and annotation tables | --- ## Part 2 GEO Data Preprocessing and Disease Target Mining ### Module 3 GEO Dataset Download, Merging and Dual Matrix Preprocessing #### Function Public transcriptomic datasets were downloaded, probe-annotated, integrated, and batch-corrected. Two standardized expression matrices were generated to adapt to downstream different analytical requirements. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | GEO transcriptome datasets | GSE141910, GSE17755, GSE1810 | HFrEF related expression profiling data | | Platform annotation files | GPL16043, GPL13497, GPL1219 | Probe-to-gene annotation reference | #### Core Procedures 1. Probe sets were mapped to Gene Symbol; multiple probes per gene were summarized by maximum expression value. 2. Cross-platform and cross-batch merging followed by batch correction using `sva::ComBat`. 3. Two normalized matrices were exported: - Matrix A: log2-transformed for DEG, WGCNA, functional enrichment - Matrix B: Linear non-log2 matrix dedicated to CIBERSORT immune infiltration #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Batch-corrected expression matrices | log2 matrix & linear matrix | Dual normalized matrices for different downstream analyses | | Sample grouping metadata | sample_group_info.csv | Group label of HFrEF and control samples | | Gene probe mapping table | probe_gene_mapping.csv | Annotation correspondence between probe and gene symbol | --- ### Module 4 Differential Expression Gene (DEG) Screening #### Function Differential expression genes between HFrEF and healthy controls were identified and visualized. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Normalized expression matrix | log2 expression matrix | Batch-corrected log2 transformed data from Module 3 | | Sample grouping information | sample_group_info.csv | Group label for differential comparison | #### Parameters - Tool: `limma` - Loose threshold: *P* < 0.05, |log2FC| ≥ 0.585 - Stringent threshold: |log2FC| > 1 - Visualization: Volcano plot #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | DEG gene list | deg_up / deg_down / deg_all | Up-regulated, down-regulated and total differential genes | | DEG visualization plot | deg_volcano.png | Volcano plot of differential expression | --- ### Module 5 WGCNA Weighted Gene Co-expression Network Analysis #### Function Construct scale-free co-expression networks, partition gene modules, and screen phenotype-related hub genes associated with HFrEF. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Normalized expression matrix | log2 expression matrix | Batch-corrected log2 expression data | | Clinical phenotype data | clinical_trait_info.csv | HFrEF phenotypic characteristic data | #### Core Procedures 1. Quality control using `goodSamplesGenes()` to remove outlier samples and low-expression genes. 2. Optimal soft threshold determined by `pickSoftThreshold()`. 3. Module-trait correlation calculated via Spearman and Mantel test. 4. Hub genes filtered by Module Membership (MM ≥ 0.6) and Gene Significance (GS ≥ 0.1). 5. Key disease-related modules: MEbrown, MEroyalblue. #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Gene module classification | wgcna_module_gene_list | Gene distribution in each co-expression module | | WGCNA hub gene set | wgcna_hub_genes.txt | Phenotype-related core genes | --- ### Module 6 HFrEF Disease Target Integration #### Function Integrate DEGs and WGCNA hub genes to obtain candidate disease targets for HFrEF. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Differential gene set | deg_all | Full DEG list from Module 4 | | Co-expression hub genes | wgcna_hub_genes.txt | Core genes from WGCNA Module 5 | #### Core Logic Intersection of DEGs and WGCNA hub genes, retaining genes with |log2FC| > 1 to ensure statistical reliability. #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Integrated disease target set | hfref_disease_targets.txt | Candidate HFrEF pathogenic genes | --- ## Part 3 Network Construction and Hub Gene Identification ### Module 7 Compound-Target and PPI Network Construction #### Function Construct herb–component–target network and protein–protein interaction (PPI) network of overlapping drug-disease targets. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | ZWHQD component targets | compound_target_final.Rdata | Merged compound-target genes from Module 2 | | HFrEF candidate targets | hfref_disease_targets.txt | Integrated disease genes from Module 6 | #### Parameters - Network visualization: Cytoscape 3.10.3 - PPI network: STRING database, *Homo sapiens*, confidence > 0.700 #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Herb-component-target network files | ctn_node.csv / ctn_edge.csv | Node and edge information of regulatory network | | PPI network files | ppi_node.csv / ppi_edge.csv | Protein-protein interaction network data | | PPI interaction matrix | ppi_adjacency_matrix.csv | Gene interaction adjacency matrix | --- ### Module 8 Hub Gene Screening via MCODE and Cytohubba #### Function Mine core sub-networks from PPI network and identify ZWHQD–HFrEF hub genes. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | PPI network file | ppi_edge.csv | Interaction network from Module 7 | #### Parameters - MCODE: degree cutoff = 2, node score cutoff = 0.2, k-core = 2 - Cytohubba: 4 local + 6 global topological algorithms - Integration: MCODE core genes + Cytohubba top genes + DEG/WGCNA intersection genes #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Final hub gene set | zwhqd_hfref_hub_genes.txt | 117 core genes for subsequent analysis | --- ## Part 4 Functional and Mechanism Analysis ### Module 9 GO/KEGG Functional Enrichment Analysis #### Function Annotate biological functions and core signaling pathways of hub genes. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Hub gene list | zwhqd_hfref_hub_genes.txt | Core genes screened from Module 8 | #### Parameters - ID conversion: `org.Hs.eg.db` - Tool: `clusterProfiler` - Threshold: *P* < 0.001 - Dimension: BP, CC, MF, KEGG pathway #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Enrichment result tables | go_anno.csv / kegg_anno.csv | GO and KEGG annotation statistics | | Enrichment plots | bubble_plot / bar_plot | Visualization of functional enrichment | --- ### Module 10 CIBERSORT Immune Infiltration Analysis #### Function Estimate relative abundance of 22 immune cell subtypes, compare intergroup differences, and analyze gene–immune correlation. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Non-log2 expression matrix | linear_expression_matrix | Unlogged batch-corrected data for immune deconvolution | | Sample grouping table | sample_group_info.csv | Group label for difference comparison | #### Rules - Only non-log2 matrix is allowed; log-transformed data are prohibited - Reference signature: LM22 - Retain samples with deconvolution *P* < 0.05 #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Immune cell abundance matrix | immune_cell_abundance.csv | Relative proportion of 22 immune cell subtypes | | Intergroup difference plots | immune_boxplot.png | Immune cell difference between groups | | Correlation heatmap | gene_immune_heatmap.png | Correlation between hub genes and immune cells | --- ### Module 11 Single-Cell RNA-seq Analysis #### Function Perform single-cell quality control, dimensionality reduction, clustering, cell type annotation, and evaluate hub gene set activity across cell subtypes. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Single-cell count matrix | GSE1810_raw_count | Original single-cell expression matrix | #### Parameters - Filter low-quality cells by mitochondrial proportion; no global ComBat - Normalization: LogNormalize; Top 2000 highly variable genes - PCA top 15 components; resolution = 1.5; UMAP clustering - Annotation: SingleR automatic annotation + CellMarker manual correction #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Cell annotation result | cell_type_annotation.csv | Cell subtype classification and labeling | | Gene set activity matrix | aucell_activity_score.csv | Functional activity score of hub gene set | | Single-cell visualization | umap_cluster_plot.png | UMAP clustering and cell distribution plot | --- ## Part 5 Causal Inference, Diagnostic Modeling and Mechanism Validation ### Module 12 Mendelian Randomization Analysis #### Function Infer causal relationship between hub genes and HFrEF using eQTL and GWAS summary data. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Core hub gene list | zwhqd_hfref_hub_genes.txt | Candidate causal genes | | eQTL dataset | eqtl_summary.csv | Expression quantitative trait loci data | | GWAS summary data | ebi-a-GCST90018910 | HFrEF genome-wide association summary | #### Parameters - SNP threshold: *P* < 5e-8; LD clumping: r² < 0.001, window = 10000 kb - Methods: IVW, Wald Ratio, MR-Egger, Weighted Median, MR-PRESSO - Threshold: *P* < 0.05 #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | MR analysis result table | mr_analysis_result.csv | Causal effect estimation and statistics | | Core causal gene list | causal_core_genes.txt | Core genes with confirmed causal association | --- ### Module 13 Colocalization and Causal Direction Verification #### Function Validate MR reliability and confirm causal orientation of candidate genes. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | MR instrumental variables | snp_iv_list.csv | Significant SNP IVs from MR analysis | | Omics summary data | eqtl & gwas summary | Matched eQTL and GWAS summary dataset | #### Methods 1. Bayesian colocalization via `coloc` package, shared causal variant judged by PP.H4 2. Steiger test and bidirectional MR to avoid reverse causation and pleiotropy #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Colocalization statistical results | coloc_result.csv | Posterior probability of shared causal locus | | Final confirmed core genes | verified_core_genes.txt | Validated HFrEF pathogenic genes | --- ### Module 14 Diagnostic Model Construction #### Function Build and validate HFrEF diagnostic prediction model based on core causal genes. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Core gene expression matrix | core_gene_expression.csv | Expression profile of verified core genes | | Sample grouping metadata | sample_group_info.csv | HFrEF and control group labels | #### Workflow Univariate logistic screening → multivariate logistic regression → nomogram construction → ROC, calibration and DCA evaluation. #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Diagnostic nomogram | model_nomogram.png | Visual predictive model diagram | | Model evaluation results | roc / calibration / dca curve | AUC, calibration efficiency and clinical net benefit | --- ### Module 15 Single-Gene GSEA Analysis #### Function Reveal downstream biological pathways and molecular mechanisms mediated by the core gene **CASP8**. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Normalized transcriptome matrix | log2 expression matrix | Batch-corrected log2 expression data | | Core representative gene | CASP8 | Key validated pathogenic gene | #### Core Procedures 1. Group samples according to high and low expression level of *CASP8*. 2. Identify differential genes between two groups using `limma`. 3. Perform gene set enrichment analysis via `clusterProfiler` to explore enriched signaling pathways. #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | GSEA enrichment table | gsea_result.csv | Enriched pathway annotation and statistics | | GSEA visualization plot | gsea_enrichment_plot.png | Pathway enrichment landscape diagram | --- ### Module 16 Molecular Docking Analysis #### Function Evaluate binding affinity and molecular interaction between ZWHQD active components and core target protein CASP8. #### Input | Input Content | Data Name | Description | |---------------|-----------|-------------| | Representative active components | Five core ZWHQD ingredients | Bioactive small molecules from Module 1 | | Receptor protein structure | PDB: 4ZBW | Crystal structure of human CASP8 | #### Core Procedures 1. Preprocess protein and ligand using Schrödinger Maestro. 2. Predict active binding pocket and perform molecular docking. 3. Evaluate binding energy and visualize 3D binding conformation via PyMOL. #### Output | Output Content | Data Name | Description | |----------------|-----------|-------------| | Docking scoring table | docking_score.csv | Binding affinity ranking of each component | | 3D binding conformation plot | docking_complex_plot.png | Visual molecular interaction diagram | --- ## Data Specification and Usage Rules | Module | Batch Correction Required | Log2 Transformation | Data Type | |--------|---------------------------|---------------------|-----------| | 1–2 | No | No | Database and manual curated compound/target data | | 3 | Yes | Output both | Raw GEO and platform annotation data | | 4 | Yes | Yes | log2 normalized expression matrix | | 5 | Yes | Yes | log2 normalized expression matrix | | 6–8 | Yes | Not required | Gene lists and network files | | 9 | Yes | Yes | Hub gene list | | 10 | Yes | Prohibited | Linear non-log2 expression matrix | | 11 | No global batch | Internal log normalization | Single-cell raw count matrix | | 12–16 | Yes | Yes (except docking) | Expression matrix and core gene sets | --- <table class="nav-table" width="100%"> <tr> <td align="left"> [Home](index.qmd) | [About](about.qmd) | [Results](results.qmd) </td> <td align="right"> [Start Analysis](chapters/Part1_TCM/01_active_component_screening.qmd) </td> </tr> </table>