Bioinformatics & Genomic AnalysisMethodology Demonstration

Transcriptomic Profiling of Type 2 Diabetes — Riyadh Cohort

📍 Illustrative Arab-population cohort🏥 Tertiary teaching hospital (simulated)👥 n = 120 participants (simulated dataset)

Simulated Case Study

This case study uses simulated data to demonstrate the statistical methodology, analysis workflow, and reporting standards we apply to real client projects. No actual patient or institutional data is represented.

RNA-seq · DESeq2 · GSEA · Network Analysis

Confidentiality Notice

All data presented in this case study is simulated and replicates the statistical structure of the original dataset. Patient identifiers, hospital name, and raw sequencing files remain confidential. No real genomic data is disclosed.

Project Overview

This project was conducted in collaboration with the endocrinology research unit of a major tertiary hospital in Riyadh. The study enrolled 120 participants — 60 diagnosed with Type 2 Diabetes Mellitus (T2DM) and 60 age- and sex-matched healthy controls — with whole-blood samples collected for bulk RNA sequencing (100bp paired-end, Illumina NovaSeq 6000).

The analysis pipeline covered raw FASTQ quality assessment, adapter trimming, alignment to the human reference genome (GRCh38), and gene-level quantification. Differential expression analysis (DESeq2) identified 847 statistically significant genes (|log2FC| ≥ 1.5, padj < 0.05), with notable upregulation in TNF signaling and IL-6 pathways and downregulation of insulin signaling components. GSEA against KEGG and GO databases, followed by protein-protein interaction network analysis, highlighted three hub genes — IRS1, SIRT1, and ADIPOQ — as candidate early-detection biomarkers specific to this Arabian-population cohort.

Key Findings

✓847 differentially expressed genes (542 upregulated, 305 downregulated)
✓Top enriched pathways: TNF signaling, PI3K-Akt, oxidative phosphorylation
✓Hub genes identified: IRS1, SIRT1, ADIPOQ
✓AUC = 0.91 for a 5-gene diagnostic signature in the Arabian cohort

Analytical Methods

▹Quality control: FastQC & MultiQC on raw FASTQ files
▹Read trimming: Trimmomatic (adapter removal, quality filtering)
▹Alignment: STAR aligner → human reference genome GRCh38
▹Gene quantification: featureCounts (GENCODE v44 annotation)
▹Differential expression: DESeq2 — |log2FC| ≥ 1.5, padj < 0.05
▹Pathway enrichment: GSEA (KEGG, Gene Ontology — Biological Process)
▹Network analysis: STRING database v12 + Cytoscape (hub gene identification)
▹Visualization: EnhancedVolcano, ComplexHeatmap, ggplot2

Tools & Software

R / BioconductorPython 3.11STAR 2.7FastQCDESeq2GSEACytoscapeIllumina NovaSeq 6000

Back to Portfolio