ZWHQD for HFrEF
  • Home
  • About
  • Methods
  • Results
  • Analysis Pipeline
    • Part 1: TCM Active Component & Target
    • Module 01: Screening of ZWHQD Active Components
    • Module 02: Prediction of ZWHQD Compound Targets
    • Part 2: GEO Data & Disease Target
    • Module 03: GEO Dataset Download & Preprocessing
    • 04_differential_expression_analysis.html
    • 05_wgcna_coexpression_network.html
    • 06_disease_target_integration.html
    • Part 3: Network Construction & Hub Gene
    • 07_compound_target_ppi_network_build.html
    • 08_hub_gene_screening.html
    • Part 4: Function, Immune & Single-Cell
    • 09_go_kegg_functional_enrichment.html
    • 10_cibersort_immune_infiltration.html
    • 11_single_cell_rnaseq_analysis.html
    • Part 5: Causal, Model & Mechanism Validation
    • 12_mendelian_randomization_analysis.html
    • 13_colocalization_causal_verification.html
    • 14_diagnostic_model_construction.html
    • 15_single_gene_gsea_analysis.html
    • 16_molecular_docking_analysis.html

On this page

  • 1 Modular Analysis Workflow & Technical Scheme
    • 1.1 Part 1 TCM Component and Target Mining
      • 1.1.1 Module 1 ZWHQD Active Component Screening
      • 1.1.2 Module 2 Component Target Prediction
    • 1.2 Part 2 GEO Data Preprocessing and Disease Target Mining
      • 1.2.1 Module 3 GEO Dataset Download, Merging and Dual Matrix Preprocessing
      • 1.2.2 Module 4 Differential Expression Gene (DEG) Screening
      • 1.2.3 Module 5 WGCNA Weighted Gene Co-expression Network Analysis
      • 1.2.4 Module 6 HFrEF Disease Target Integration
    • 1.3 Part 3 Network Construction and Hub Gene Identification
      • 1.3.1 Module 7 Compound-Target and PPI Network Construction
      • 1.3.2 Module 8 Hub Gene Screening via MCODE and Cytohubba
    • 1.4 Part 4 Functional and Mechanism Analysis
      • 1.4.1 Module 9 GO/KEGG Functional Enrichment Analysis
      • 1.4.2 Module 10 CIBERSORT Immune Infiltration Analysis
      • 1.4.3 Module 11 Single-Cell RNA-seq Analysis
    • 1.5 Part 5 Causal Inference, Diagnostic Modeling and Mechanism Validation
      • 1.5.1 Module 12 Mendelian Randomization Analysis
      • 1.5.2 Module 13 Colocalization and Causal Direction Verification
      • 1.5.3 Module 14 Diagnostic Model Construction
      • 1.5.4 Module 15 Single-Gene GSEA Analysis
      • 1.5.5 Module 16 Molecular Docking Analysis
    • 1.6 Data Specification and Usage Rules

Methods & Technical Scheme

Authors
Affiliations

Kun Hou

Health Science Center, Xi’an Jiaotong University

Hanzhong Traditional Chinese Medicine Hospital

Supervisor’s name

Health Science Center, Xi’an Jiaotong University

The First Affiliated Hospital of Xi’an Jiaotong University

1 Modular Analysis Workflow & Technical Scheme

1.1 Part 1 TCM Component and Target Mining

1.1.1 Module 1 ZWHQD Active Component Screening

1.1.1.1 Function

Based on the DCABM-TCM database, blood-absorbed prototype and metabolic compounds of six medicinal herbs in Zhenwu Huangqi Decoction were collected. Strict filtering rules, manual supplementation of key metabolites, and batch retrieval of chemical descriptors from PubChem were performed to generate a standardized dataset of candidate active components.

1.1.1.2 Screening Criteria

  • Compounds without valid PubChem CID were removed.
  • Highly toxic diester-type diterpenoid alkaloids from Aconiti Radix Lateralis Preparata were excluded.
  • Endogenous nutrients including sugars, amino acids, vitamins, and common organic acids were discarded.
  • The key astragalus metabolite cycloastragenol was manually supplemented.
  • Duplicates and ambiguous annotations were trimmed; only categories with documented heart failure-related activity were retained.

1.1.1.3 Input

Input Content Data Name Description
Raw herbal compound table zwhqd_compound_dcabm.xlsx Original component data downloaded from DCABM-TCM
Manual revision table compounds_add_manu.xlsx Artificial supplementation and correction of components
Compound property filtering table compounds_properties_manu.xlsx Toxicity and category screening annotation table

1.1.1.4 Core Procedures

  1. Import raw TCM component data from DCABM-TCM.
  2. Text cleaning, compound separation, and extraction of PubChem CID, compound name, and molecular formula.
  3. Manual addition of cycloastragenol.
  4. Manual correction of CID and nomenclature; removal of entries missing CID and duplicate records.
  5. Filtering by toxicity classification and compound category to eliminate unqualified ingredients.
  6. Parallel batch query via PubChem to obtain IUPAC name, molecular weight, SMILES, InChI, and InChIKey.
  7. Data integration, deduplication, statistical summary of ingredient numbers per herb, shared compound analysis, and export of supplementary tables.

1.1.1.5 Output

Output Content Data Name Description
Standardized active component dataset compounds_final.RData / compounds_final.xlsx Filtered and annotated ZWHQD active components
Supplementary tables Table S1, Table S2 Component classification and basic information statistics
Shared compound statistics compound_sharing_stats Statistics of common ingredients across herbs

1.1.2 Module 2 Component Target Prediction

1.1.2.1 Function

Potential targets of qualified active components were predicted using three authoritative databases. Predictions were filtered by confidence thresholds, unified with UniProt annotation, merged and deduplicated to construct high-quality compound–target pairs.

1.1.2.2 Prediction Rules

BATMAN-TCM - Retain known validated targets - Predicted targets with score ≥ 0.84

Super-Pred - Retain known experimental targets - Predicted targets with probability > 70% and model accuracy > 90%

SwissTargetPrediction - Predicted targets with probability > 0.1

All target symbols were standardized to official HGNC nomenclature via UniProt human reviewed database. Known and predicted targets were merged and deduplicated.

1.1.2.3 Input

Input Content Data Name Description
Active component dataset compounds_final.Rdata Output screened components from Module 1
Raw target datasets BATMAN / Super-Pred / SwissTargetPrediction raw data Downloaded target prediction results
Human protein annotation library Uniprot human reviewed database Standardize gene symbol and UniProt ID

1.1.2.4 Output

Output Content Data Name Description
Known component targets compound_target_known.Rdata Experimentally validated compound-target pairs
Predicted component targets compound_target_pred.Rdata High-confidence predicted targets
Integrated target set compound_target_final.Rdata Merged and deduplicated compound-target dataset
Supplementary tables Table S3–S6 Target statistics and annotation tables

1.2 Part 2 GEO Data Preprocessing and Disease Target Mining

1.2.1 Module 3 GEO Dataset Download, Merging and Dual Matrix Preprocessing

1.2.1.1 Function

Public transcriptomic datasets were downloaded, probe-annotated, integrated, and batch-corrected. Two standardized expression matrices were generated to adapt to downstream different analytical requirements.

1.2.1.2 Input

Input Content Data Name Description
GEO transcriptome datasets GSE141910, GSE17755, GSE1810 HFrEF related expression profiling data
Platform annotation files GPL16043, GPL13497, GPL1219 Probe-to-gene annotation reference

1.2.1.3 Core Procedures

  1. Probe sets were mapped to Gene Symbol; multiple probes per gene were summarized by maximum expression value.
  2. Cross-platform and cross-batch merging followed by batch correction using sva::ComBat.
  3. Two normalized matrices were exported:
    • Matrix A: log2-transformed for DEG, WGCNA, functional enrichment
    • Matrix B: Linear non-log2 matrix dedicated to CIBERSORT immune infiltration

1.2.1.4 Output

Output Content Data Name Description
Batch-corrected expression matrices log2 matrix & linear matrix Dual normalized matrices for different downstream analyses
Sample grouping metadata sample_group_info.csv Group label of HFrEF and control samples
Gene probe mapping table probe_gene_mapping.csv Annotation correspondence between probe and gene symbol

1.2.2 Module 4 Differential Expression Gene (DEG) Screening

1.2.2.1 Function

Differential expression genes between HFrEF and healthy controls were identified and visualized.

1.2.2.2 Input

Input Content Data Name Description
Normalized expression matrix log2 expression matrix Batch-corrected log2 transformed data from Module 3
Sample grouping information sample_group_info.csv Group label for differential comparison

1.2.2.3 Parameters

  • Tool: limma
  • Loose threshold: P < 0.05, |log2FC| ≥ 0.585
  • Stringent threshold: |log2FC| > 1
  • Visualization: Volcano plot

1.2.2.4 Output

Output Content Data Name Description
DEG gene list deg_up / deg_down / deg_all Up-regulated, down-regulated and total differential genes
DEG visualization plot deg_volcano.png Volcano plot of differential expression

1.2.3 Module 5 WGCNA Weighted Gene Co-expression Network Analysis

1.2.3.1 Function

Construct scale-free co-expression networks, partition gene modules, and screen phenotype-related hub genes associated with HFrEF.

1.2.3.2 Input

Input Content Data Name Description
Normalized expression matrix log2 expression matrix Batch-corrected log2 expression data
Clinical phenotype data clinical_trait_info.csv HFrEF phenotypic characteristic data

1.2.3.3 Core Procedures

  1. Quality control using goodSamplesGenes() to remove outlier samples and low-expression genes.
  2. Optimal soft threshold determined by pickSoftThreshold().
  3. Module-trait correlation calculated via Spearman and Mantel test.
  4. Hub genes filtered by Module Membership (MM ≥ 0.6) and Gene Significance (GS ≥ 0.1).
  5. Key disease-related modules: MEbrown, MEroyalblue.

1.2.3.4 Output

Output Content Data Name Description
Gene module classification wgcna_module_gene_list Gene distribution in each co-expression module
WGCNA hub gene set wgcna_hub_genes.txt Phenotype-related core genes

1.2.4 Module 6 HFrEF Disease Target Integration

1.2.4.1 Function

Integrate DEGs and WGCNA hub genes to obtain candidate disease targets for HFrEF.

1.2.4.2 Input

Input Content Data Name Description
Differential gene set deg_all Full DEG list from Module 4
Co-expression hub genes wgcna_hub_genes.txt Core genes from WGCNA Module 5

1.2.4.3 Core Logic

Intersection of DEGs and WGCNA hub genes, retaining genes with |log2FC| > 1 to ensure statistical reliability.

1.2.4.4 Output

Output Content Data Name Description
Integrated disease target set hfref_disease_targets.txt Candidate HFrEF pathogenic genes

1.3 Part 3 Network Construction and Hub Gene Identification

1.3.1 Module 7 Compound-Target and PPI Network Construction

1.3.1.1 Function

Construct herb–component–target network and protein–protein interaction (PPI) network of overlapping drug-disease targets.

1.3.1.2 Input

Input Content Data Name Description
ZWHQD component targets compound_target_final.Rdata Merged compound-target genes from Module 2
HFrEF candidate targets hfref_disease_targets.txt Integrated disease genes from Module 6

1.3.1.3 Parameters

  • Network visualization: Cytoscape 3.10.3
  • PPI network: STRING database, Homo sapiens, confidence > 0.700

1.3.1.4 Output

Output Content Data Name Description
Herb-component-target network files ctn_node.csv / ctn_edge.csv Node and edge information of regulatory network
PPI network files ppi_node.csv / ppi_edge.csv Protein-protein interaction network data
PPI interaction matrix ppi_adjacency_matrix.csv Gene interaction adjacency matrix

1.3.2 Module 8 Hub Gene Screening via MCODE and Cytohubba

1.3.2.1 Function

Mine core sub-networks from PPI network and identify ZWHQD–HFrEF hub genes.

1.3.2.2 Input

Input Content Data Name Description
PPI network file ppi_edge.csv Interaction network from Module 7

1.3.2.3 Parameters

  • MCODE: degree cutoff = 2, node score cutoff = 0.2, k-core = 2
  • Cytohubba: 4 local + 6 global topological algorithms
  • Integration: MCODE core genes + Cytohubba top genes + DEG/WGCNA intersection genes

1.3.2.4 Output

Output Content Data Name Description
Final hub gene set zwhqd_hfref_hub_genes.txt 117 core genes for subsequent analysis

1.4 Part 4 Functional and Mechanism Analysis

1.4.1 Module 9 GO/KEGG Functional Enrichment Analysis

1.4.1.1 Function

Annotate biological functions and core signaling pathways of hub genes.

1.4.1.2 Input

Input Content Data Name Description
Hub gene list zwhqd_hfref_hub_genes.txt Core genes screened from Module 8

1.4.1.3 Parameters

  • ID conversion: org.Hs.eg.db
  • Tool: clusterProfiler
  • Threshold: P < 0.001
  • Dimension: BP, CC, MF, KEGG pathway

1.4.1.4 Output

Output Content Data Name Description
Enrichment result tables go_anno.csv / kegg_anno.csv GO and KEGG annotation statistics
Enrichment plots bubble_plot / bar_plot Visualization of functional enrichment

1.4.2 Module 10 CIBERSORT Immune Infiltration Analysis

1.4.2.1 Function

Estimate relative abundance of 22 immune cell subtypes, compare intergroup differences, and analyze gene–immune correlation.

1.4.2.2 Input

Input Content Data Name Description
Non-log2 expression matrix linear_expression_matrix Unlogged batch-corrected data for immune deconvolution
Sample grouping table sample_group_info.csv Group label for difference comparison

1.4.2.3 Rules

  • Only non-log2 matrix is allowed; log-transformed data are prohibited
  • Reference signature: LM22
  • Retain samples with deconvolution P < 0.05

1.4.2.4 Output

Output Content Data Name Description
Immune cell abundance matrix immune_cell_abundance.csv Relative proportion of 22 immune cell subtypes
Intergroup difference plots immune_boxplot.png Immune cell difference between groups
Correlation heatmap gene_immune_heatmap.png Correlation between hub genes and immune cells

1.4.3 Module 11 Single-Cell RNA-seq Analysis

1.4.3.1 Function

Perform single-cell quality control, dimensionality reduction, clustering, cell type annotation, and evaluate hub gene set activity across cell subtypes.

1.4.3.2 Input

Input Content Data Name Description
Single-cell count matrix GSE1810_raw_count Original single-cell expression matrix

1.4.3.3 Parameters

  • Filter low-quality cells by mitochondrial proportion; no global ComBat
  • Normalization: LogNormalize; Top 2000 highly variable genes
  • PCA top 15 components; resolution = 1.5; UMAP clustering
  • Annotation: SingleR automatic annotation + CellMarker manual correction

1.4.3.4 Output

Output Content Data Name Description
Cell annotation result cell_type_annotation.csv Cell subtype classification and labeling
Gene set activity matrix aucell_activity_score.csv Functional activity score of hub gene set
Single-cell visualization umap_cluster_plot.png UMAP clustering and cell distribution plot

1.5 Part 5 Causal Inference, Diagnostic Modeling and Mechanism Validation

1.5.1 Module 12 Mendelian Randomization Analysis

1.5.1.1 Function

Infer causal relationship between hub genes and HFrEF using eQTL and GWAS summary data.

1.5.1.2 Input

Input Content Data Name Description
Core hub gene list zwhqd_hfref_hub_genes.txt Candidate causal genes
eQTL dataset eqtl_summary.csv Expression quantitative trait loci data
GWAS summary data ebi-a-GCST90018910 HFrEF genome-wide association summary

1.5.1.3 Parameters

  • SNP threshold: P < 5e-8; LD clumping: r² < 0.001, window = 10000 kb
  • Methods: IVW, Wald Ratio, MR-Egger, Weighted Median, MR-PRESSO
  • Threshold: P < 0.05

1.5.1.4 Output

Output Content Data Name Description
MR analysis result table mr_analysis_result.csv Causal effect estimation and statistics
Core causal gene list causal_core_genes.txt Core genes with confirmed causal association

1.5.2 Module 13 Colocalization and Causal Direction Verification

1.5.2.1 Function

Validate MR reliability and confirm causal orientation of candidate genes.

1.5.2.2 Input

Input Content Data Name Description
MR instrumental variables snp_iv_list.csv Significant SNP IVs from MR analysis
Omics summary data eqtl & gwas summary Matched eQTL and GWAS summary dataset

1.5.2.3 Methods

  1. Bayesian colocalization via coloc package, shared causal variant judged by PP.H4
  2. Steiger test and bidirectional MR to avoid reverse causation and pleiotropy

1.5.2.4 Output

Output Content Data Name Description
Colocalization statistical results coloc_result.csv Posterior probability of shared causal locus
Final confirmed core genes verified_core_genes.txt Validated HFrEF pathogenic genes

1.5.3 Module 14 Diagnostic Model Construction

1.5.3.1 Function

Build and validate HFrEF diagnostic prediction model based on core causal genes.

1.5.3.2 Input

Input Content Data Name Description
Core gene expression matrix core_gene_expression.csv Expression profile of verified core genes
Sample grouping metadata sample_group_info.csv HFrEF and control group labels

1.5.3.3 Workflow

Univariate logistic screening → multivariate logistic regression → nomogram construction → ROC, calibration and DCA evaluation.

1.5.3.4 Output

Output Content Data Name Description
Diagnostic nomogram model_nomogram.png Visual predictive model diagram
Model evaluation results roc / calibration / dca curve AUC, calibration efficiency and clinical net benefit

1.5.4 Module 15 Single-Gene GSEA Analysis

1.5.4.1 Function

Reveal downstream biological pathways and molecular mechanisms mediated by the core gene CASP8.

1.5.4.2 Input

Input Content Data Name Description
Normalized transcriptome matrix log2 expression matrix Batch-corrected log2 expression data
Core representative gene CASP8 Key validated pathogenic gene

1.5.4.3 Core Procedures

  1. Group samples according to high and low expression level of CASP8.
  2. Identify differential genes between two groups using limma.
  3. Perform gene set enrichment analysis via clusterProfiler to explore enriched signaling pathways.

1.5.4.4 Output

Output Content Data Name Description
GSEA enrichment table gsea_result.csv Enriched pathway annotation and statistics
GSEA visualization plot gsea_enrichment_plot.png Pathway enrichment landscape diagram

1.5.5 Module 16 Molecular Docking Analysis

1.5.5.1 Function

Evaluate binding affinity and molecular interaction between ZWHQD active components and core target protein CASP8.

1.5.5.2 Input

Input Content Data Name Description
Representative active components Five core ZWHQD ingredients Bioactive small molecules from Module 1
Receptor protein structure PDB: 4ZBW Crystal structure of human CASP8

1.5.5.3 Core Procedures

  1. Preprocess protein and ligand using Schrödinger Maestro.
  2. Predict active binding pocket and perform molecular docking.
  3. Evaluate binding energy and visualize 3D binding conformation via PyMOL.

1.5.5.4 Output

Output Content Data Name Description
Docking scoring table docking_score.csv Binding affinity ranking of each component
3D binding conformation plot docking_complex_plot.png Visual molecular interaction diagram

1.6 Data Specification and Usage Rules

Module Batch Correction Required Log2 Transformation Data Type
1–2 No No Database and manual curated compound/target data
3 Yes Output both Raw GEO and platform annotation data
4 Yes Yes log2 normalized expression matrix
5 Yes Yes log2 normalized expression matrix
6–8 Yes Not required Gene lists and network files
9 Yes Yes Hub gene list
10 Yes Prohibited Linear non-log2 expression matrix
11 No global batch Internal log normalization Single-cell raw count matrix
12–16 Yes Yes (except docking) Expression matrix and core gene sets

Home | About | Results Start Analysis
Source Code
---
title: "Methods & Technical Scheme"
format:
  html:
    theme: cosmo
    toc: true
    toc-depth: 3
    number-sections: true
    smooth-scroll: true
    page-layout: full
    fig-dpi: 300
    fig-align: center

execute:
  eval: false
  echo: true
  warning: false
  message: false
---

# Modular Analysis Workflow & Technical Scheme

## Part 1 TCM Component and Target Mining
### Module 1 ZWHQD Active Component Screening
#### Function
Based on the DCABM-TCM database, blood-absorbed prototype and metabolic compounds of six medicinal herbs in Zhenwu Huangqi Decoction were collected. Strict filtering rules, manual supplementation of key metabolites, and batch retrieval of chemical descriptors from PubChem were performed to generate a standardized dataset of candidate active components.

#### Screening Criteria
- Compounds without valid PubChem CID were removed.
- Highly toxic diester-type diterpenoid alkaloids from *Aconiti Radix Lateralis Preparata* were excluded.
- Endogenous nutrients including sugars, amino acids, vitamins, and common organic acids were discarded.
- The key astragalus metabolite **cycloastragenol** was manually supplemented.
- Duplicates and ambiguous annotations were trimmed; only categories with documented heart failure-related activity were retained.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Raw herbal compound table | zwhqd_compound_dcabm.xlsx | Original component data downloaded from DCABM-TCM |
| Manual revision table | compounds_add_manu.xlsx | Artificial supplementation and correction of components |
| Compound property filtering table | compounds_properties_manu.xlsx | Toxicity and category screening annotation table |

#### Core Procedures
1. Import raw TCM component data from DCABM-TCM.
2. Text cleaning, compound separation, and extraction of PubChem CID, compound name, and molecular formula.
3. Manual addition of cycloastragenol.
4. Manual correction of CID and nomenclature; removal of entries missing CID and duplicate records.
5. Filtering by toxicity classification and compound category to eliminate unqualified ingredients.
6. Parallel batch query via PubChem to obtain IUPAC name, molecular weight, SMILES, InChI, and InChIKey.
7. Data integration, deduplication, statistical summary of ingredient numbers per herb, shared compound analysis, and export of supplementary tables.

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Standardized active component dataset | compounds_final.RData / compounds_final.xlsx | Filtered and annotated ZWHQD active components |
| Supplementary tables | Table S1, Table S2 | Component classification and basic information statistics |
| Shared compound statistics | compound_sharing_stats | Statistics of common ingredients across herbs |

---

### Module 2 Component Target Prediction
#### Function
Potential targets of qualified active components were predicted using three authoritative databases. Predictions were filtered by confidence thresholds, unified with UniProt annotation, merged and deduplicated to construct high-quality compound–target pairs.

#### Prediction Rules
**BATMAN-TCM**
- Retain known validated targets
- Predicted targets with score ≥ 0.84

**Super-Pred**
- Retain known experimental targets
- Predicted targets with probability > 70% and model accuracy > 90%

**SwissTargetPrediction**
- Predicted targets with probability > 0.1

All target symbols were standardized to official HGNC nomenclature via UniProt human reviewed database. Known and predicted targets were merged and deduplicated.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Active component dataset | compounds_final.Rdata | Output screened components from Module 1 |
| Raw target datasets | BATMAN / Super-Pred / SwissTargetPrediction raw data | Downloaded target prediction results |
| Human protein annotation library | Uniprot human reviewed database | Standardize gene symbol and UniProt ID |

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Known component targets | compound_target_known.Rdata | Experimentally validated compound-target pairs |
| Predicted component targets | compound_target_pred.Rdata | High-confidence predicted targets |
| Integrated target set | compound_target_final.Rdata | Merged and deduplicated compound-target dataset |
| Supplementary tables | Table S3–S6 | Target statistics and annotation tables |

---

## Part 2 GEO Data Preprocessing and Disease Target Mining
### Module 3 GEO Dataset Download, Merging and Dual Matrix Preprocessing
#### Function
Public transcriptomic datasets were downloaded, probe-annotated, integrated, and batch-corrected. Two standardized expression matrices were generated to adapt to downstream different analytical requirements.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| GEO transcriptome datasets | GSE141910, GSE17755, GSE1810 | HFrEF related expression profiling data |
| Platform annotation files | GPL16043, GPL13497, GPL1219 | Probe-to-gene annotation reference |

#### Core Procedures
1. Probe sets were mapped to Gene Symbol; multiple probes per gene were summarized by maximum expression value.
2. Cross-platform and cross-batch merging followed by batch correction using `sva::ComBat`.
3. Two normalized matrices were exported:
   - Matrix A: log2-transformed for DEG, WGCNA, functional enrichment
   - Matrix B: Linear non-log2 matrix dedicated to CIBERSORT immune infiltration

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Batch-corrected expression matrices | log2 matrix & linear matrix | Dual normalized matrices for different downstream analyses |
| Sample grouping metadata | sample_group_info.csv | Group label of HFrEF and control samples |
| Gene probe mapping table | probe_gene_mapping.csv | Annotation correspondence between probe and gene symbol |

---

### Module 4 Differential Expression Gene (DEG) Screening
#### Function
Differential expression genes between HFrEF and healthy controls were identified and visualized.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Normalized expression matrix | log2 expression matrix | Batch-corrected log2 transformed data from Module 3 |
| Sample grouping information | sample_group_info.csv | Group label for differential comparison |

#### Parameters
- Tool: `limma`
- Loose threshold: *P* < 0.05, |log2FC| ≥ 0.585
- Stringent threshold: |log2FC| > 1
- Visualization: Volcano plot

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| DEG gene list | deg_up / deg_down / deg_all | Up-regulated, down-regulated and total differential genes |
| DEG visualization plot | deg_volcano.png | Volcano plot of differential expression |

---

### Module 5 WGCNA Weighted Gene Co-expression Network Analysis
#### Function
Construct scale-free co-expression networks, partition gene modules, and screen phenotype-related hub genes associated with HFrEF.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Normalized expression matrix | log2 expression matrix | Batch-corrected log2 expression data |
| Clinical phenotype data | clinical_trait_info.csv | HFrEF phenotypic characteristic data |

#### Core Procedures
1. Quality control using `goodSamplesGenes()` to remove outlier samples and low-expression genes.
2. Optimal soft threshold determined by `pickSoftThreshold()`.
3. Module-trait correlation calculated via Spearman and Mantel test.
4. Hub genes filtered by Module Membership (MM ≥ 0.6) and Gene Significance (GS ≥ 0.1).
5. Key disease-related modules: MEbrown, MEroyalblue.

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Gene module classification | wgcna_module_gene_list | Gene distribution in each co-expression module |
| WGCNA hub gene set | wgcna_hub_genes.txt | Phenotype-related core genes |

---

### Module 6 HFrEF Disease Target Integration
#### Function
Integrate DEGs and WGCNA hub genes to obtain candidate disease targets for HFrEF.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Differential gene set | deg_all | Full DEG list from Module 4 |
| Co-expression hub genes | wgcna_hub_genes.txt | Core genes from WGCNA Module 5 |

#### Core Logic
Intersection of DEGs and WGCNA hub genes, retaining genes with |log2FC| > 1 to ensure statistical reliability.

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Integrated disease target set | hfref_disease_targets.txt | Candidate HFrEF pathogenic genes |

---

## Part 3 Network Construction and Hub Gene Identification
### Module 7 Compound-Target and PPI Network Construction
#### Function
Construct herb–component–target network and protein–protein interaction (PPI) network of overlapping drug-disease targets.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| ZWHQD component targets | compound_target_final.Rdata | Merged compound-target genes from Module 2 |
| HFrEF candidate targets | hfref_disease_targets.txt | Integrated disease genes from Module 6 |

#### Parameters
- Network visualization: Cytoscape 3.10.3
- PPI network: STRING database, *Homo sapiens*, confidence > 0.700

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Herb-component-target network files | ctn_node.csv / ctn_edge.csv | Node and edge information of regulatory network |
| PPI network files | ppi_node.csv / ppi_edge.csv | Protein-protein interaction network data |
| PPI interaction matrix | ppi_adjacency_matrix.csv | Gene interaction adjacency matrix |

---

### Module 8 Hub Gene Screening via MCODE and Cytohubba
#### Function
Mine core sub-networks from PPI network and identify ZWHQD–HFrEF hub genes.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| PPI network file | ppi_edge.csv | Interaction network from Module 7 |

#### Parameters
- MCODE: degree cutoff = 2, node score cutoff = 0.2, k-core = 2
- Cytohubba: 4 local + 6 global topological algorithms
- Integration: MCODE core genes + Cytohubba top genes + DEG/WGCNA intersection genes

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Final hub gene set | zwhqd_hfref_hub_genes.txt | 117 core genes for subsequent analysis |

---

## Part 4 Functional and Mechanism Analysis
### Module 9 GO/KEGG Functional Enrichment Analysis
#### Function
Annotate biological functions and core signaling pathways of hub genes.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Hub gene list | zwhqd_hfref_hub_genes.txt | Core genes screened from Module 8 |

#### Parameters
- ID conversion: `org.Hs.eg.db`
- Tool: `clusterProfiler`
- Threshold: *P* < 0.001
- Dimension: BP, CC, MF, KEGG pathway

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Enrichment result tables | go_anno.csv / kegg_anno.csv | GO and KEGG annotation statistics |
| Enrichment plots | bubble_plot / bar_plot | Visualization of functional enrichment |

---

### Module 10 CIBERSORT Immune Infiltration Analysis
#### Function
Estimate relative abundance of 22 immune cell subtypes, compare intergroup differences, and analyze gene–immune correlation.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Non-log2 expression matrix | linear_expression_matrix | Unlogged batch-corrected data for immune deconvolution |
| Sample grouping table | sample_group_info.csv | Group label for difference comparison |

#### Rules
- Only non-log2 matrix is allowed; log-transformed data are prohibited
- Reference signature: LM22
- Retain samples with deconvolution *P* < 0.05

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Immune cell abundance matrix | immune_cell_abundance.csv | Relative proportion of 22 immune cell subtypes |
| Intergroup difference plots | immune_boxplot.png | Immune cell difference between groups |
| Correlation heatmap | gene_immune_heatmap.png | Correlation between hub genes and immune cells |

---

### Module 11 Single-Cell RNA-seq Analysis
#### Function
Perform single-cell quality control, dimensionality reduction, clustering, cell type annotation, and evaluate hub gene set activity across cell subtypes.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Single-cell count matrix | GSE1810_raw_count | Original single-cell expression matrix |

#### Parameters
- Filter low-quality cells by mitochondrial proportion; no global ComBat
- Normalization: LogNormalize; Top 2000 highly variable genes
- PCA top 15 components; resolution = 1.5; UMAP clustering
- Annotation: SingleR automatic annotation + CellMarker manual correction

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Cell annotation result | cell_type_annotation.csv | Cell subtype classification and labeling |
| Gene set activity matrix | aucell_activity_score.csv | Functional activity score of hub gene set |
| Single-cell visualization | umap_cluster_plot.png | UMAP clustering and cell distribution plot |

---

## Part 5 Causal Inference, Diagnostic Modeling and Mechanism Validation
### Module 12 Mendelian Randomization Analysis
#### Function
Infer causal relationship between hub genes and HFrEF using eQTL and GWAS summary data.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Core hub gene list | zwhqd_hfref_hub_genes.txt | Candidate causal genes |
| eQTL dataset | eqtl_summary.csv | Expression quantitative trait loci data |
| GWAS summary data | ebi-a-GCST90018910 | HFrEF genome-wide association summary |

#### Parameters
- SNP threshold: *P* < 5e-8; LD clumping: r² < 0.001, window = 10000 kb
- Methods: IVW, Wald Ratio, MR-Egger, Weighted Median, MR-PRESSO
- Threshold: *P* < 0.05

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| MR analysis result table | mr_analysis_result.csv | Causal effect estimation and statistics |
| Core causal gene list | causal_core_genes.txt | Core genes with confirmed causal association |

---

### Module 13 Colocalization and Causal Direction Verification
#### Function
Validate MR reliability and confirm causal orientation of candidate genes.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| MR instrumental variables | snp_iv_list.csv | Significant SNP IVs from MR analysis |
| Omics summary data | eqtl & gwas summary | Matched eQTL and GWAS summary dataset |

#### Methods
1. Bayesian colocalization via `coloc` package, shared causal variant judged by PP.H4
2. Steiger test and bidirectional MR to avoid reverse causation and pleiotropy

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Colocalization statistical results | coloc_result.csv | Posterior probability of shared causal locus |
| Final confirmed core genes | verified_core_genes.txt | Validated HFrEF pathogenic genes |

---

### Module 14 Diagnostic Model Construction
#### Function
Build and validate HFrEF diagnostic prediction model based on core causal genes.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Core gene expression matrix | core_gene_expression.csv | Expression profile of verified core genes |
| Sample grouping metadata | sample_group_info.csv | HFrEF and control group labels |

#### Workflow
Univariate logistic screening → multivariate logistic regression → nomogram construction → ROC, calibration and DCA evaluation.

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Diagnostic nomogram | model_nomogram.png | Visual predictive model diagram |
| Model evaluation results | roc / calibration / dca curve | AUC, calibration efficiency and clinical net benefit |

---

### Module 15 Single-Gene GSEA Analysis
#### Function
Reveal downstream biological pathways and molecular mechanisms mediated by the core gene **CASP8**.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Normalized transcriptome matrix | log2 expression matrix | Batch-corrected log2 expression data |
| Core representative gene | CASP8 | Key validated pathogenic gene |

#### Core Procedures
1. Group samples according to high and low expression level of *CASP8*.
2. Identify differential genes between two groups using `limma`.
3. Perform gene set enrichment analysis via `clusterProfiler` to explore enriched signaling pathways.

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| GSEA enrichment table | gsea_result.csv | Enriched pathway annotation and statistics |
| GSEA visualization plot | gsea_enrichment_plot.png | Pathway enrichment landscape diagram |

---

### Module 16 Molecular Docking Analysis
#### Function
Evaluate binding affinity and molecular interaction between ZWHQD active components and core target protein CASP8.

#### Input
| Input Content | Data Name | Description |
|---------------|-----------|-------------|
| Representative active components | Five core ZWHQD ingredients | Bioactive small molecules from Module 1 |
| Receptor protein structure | PDB: 4ZBW | Crystal structure of human CASP8 |

#### Core Procedures
1. Preprocess protein and ligand using Schrödinger Maestro.
2. Predict active binding pocket and perform molecular docking.
3. Evaluate binding energy and visualize 3D binding conformation via PyMOL.

#### Output
| Output Content | Data Name | Description |
|----------------|-----------|-------------|
| Docking scoring table | docking_score.csv | Binding affinity ranking of each component |
| 3D binding conformation plot | docking_complex_plot.png | Visual molecular interaction diagram |

---

## Data Specification and Usage Rules
| Module | Batch Correction Required | Log2 Transformation | Data Type |
|--------|---------------------------|---------------------|-----------|
| 1–2 | No | No | Database and manual curated compound/target data |
| 3 | Yes | Output both | Raw GEO and platform annotation data |
| 4 | Yes | Yes | log2 normalized expression matrix |
| 5 | Yes | Yes | log2 normalized expression matrix |
| 6–8 | Yes | Not required | Gene lists and network files |
| 9 | Yes | Yes | Hub gene list |
| 10 | Yes | Prohibited | Linear non-log2 expression matrix |
| 11 | No global batch | Internal log normalization | Single-cell raw count matrix |
| 12–16 | Yes | Yes (except docking) | Expression matrix and core gene sets |

---

<table class="nav-table" width="100%">
  <tr>
    <td align="left">
      [Home](index.qmd) | [About](about.qmd) | [Results](results.qmd)
    </td>
    <td align="right">
      [Start Analysis](chapters/Part1_TCM/01_active_component_screening.qmd)
    </td>
  </tr>
</table>