π Data Science Projects
π AI-Powered Cyber Risk Scoring Engine
π Overview
This project develops a machine learning-based cyber risk scoring engine that classifies enterprise assets into Low, Medium, High, and Critical risk categories.
It integrates cybersecurity, governance, and data science to move beyond traditional spreadsheet-based risk assessments toward predictive, data-driven decision-making.
π― Business Problem
Organizations often rely on manual and subjective methods to assess cyber risk, which are:
- inconsistent
- time-consuming
- difficult to scale
This project demonstrates how machine learning can improve:
- risk prioritization
- remediation strategies
- governance and compliance processes
π§ Objectives
- Build a predictive cyber risk classification model
- Identify key drivers of cyber risk
- Support GRC decision-making using data science
- Demonstrate real-world cybersecurity analytics capability
π Features Used
- CVSS Score (vulnerability severity)
- Patch Delay (days)
- Incident Frequency
- System Criticality
- Third-Party Risk Score
- Compliance Score
- Failed Login Rate
- Data Sensitivity
- Open Security Findings
βοΈ Model
A Random Forest Classifier was used for multiclass classification.
- Handles complex feature interactions
- Robust against overfitting
- Suitable for tabular cybersecurity data
π Model Performance
- Accuracy: 86%
- ROC-AUC: 0.94 (excellent classification capability)
Key Observations:
- Strong performance for Critical risk detection
- Reduced performance for Low/Medium classes due to class imbalance
π Key Findings
The most influential drivers of cyber risk include:
- CVSS score
- Patch delay
- Open findings
- Failed login rate
- Compliance score
- Third-party risk score
These findings confirm that:
Delayed remediation, weak control posture, and unresolved vulnerabilities significantly increase cyber risk exposure.
βοΈ Practical Impact (GRC)
This model enables:
- Risk-based prioritization of assets
- Faster remediation decision-making
- Improved audit targeting
- Better third-party risk evaluation
πΌ Executive Value
This solution supports:
- Data-driven cyber risk governance
- Board-level reporting dashboards
- Regulatory compliance (NIST, ISO 27001)
- Operational risk visibility
It demonstrates how AI can transform cybersecurity from reactive to proactive and predictive.
β οΈ Limitations
- Dataset imbalance affects minority class prediction
- Synthetic dataset (simulated environment)
Future Improvements:
- Apply SMOTE for class balancing
- Integrate real-world CVE/NVD data
- Add SIEM log analytics
- Deploy as a Streamlit web application
- Build Power BI dashboard for executives
π οΈ Tech Stack
- Python
- pandas, numpy
- scikit-learn
- matplotlib
- Jupyter Notebook
π Future Enhancements
- Real-time risk scoring pipeline
- Cloud deployment (AWS)
- Integration with security monitoring systems
- AI Governance and Risk modeling extension
π Keywords
Cybersecurity, GRC, Risk Management, Machine Learning, Data Science, AI, Governance, Compliance, Cyber Risk, Predictive Analytics
π Telco Customer Churn β End-to-End Decision Intelligence System
Designing a production-grade churn system that converts ML signals into revenue-preserving decisions
This project demonstrates how modern data science is applied inside real companies: from ambiguous business problems to clear, defensible actions.
Unlike tutorial projects, this system emphasizes:
- business framing
- behavioral segmentation
- explainable modeling
- ROI-aware decision logic
π― Problem Statement
Subscription businesses lose millions annually to churn. The challenge is not predicting churn, but deciding:
- Which customers are worth saving
- Which churn is unavoidable
- How to intervene without destroying margin
This project operationalizes churn management using a decision framework aligned with how teams at Google, Meta, Amazon, Netflix, and Microsoft work.
π§ System Architecture (High Level)
Raw Customer Data
β
Business EDA (Phase 1)
β
Behavioral Segmentation (Phase 2)
β
Churn Modeling (Phase 3)
β
Decision Layer (Phase 4)
β
Retention Actions (CRM / Ops Ready)
Core decision logic:
Segment Γ Churn Risk Γ Customer Value
π Phase 1 β Business EDA
Understand churn as an economic problem
Key Visuals
π Overall Churn Rate

~27% churn β material revenue risk requiring targeted intervention
π Churn by Contract Type

Month-to-month customers churn 3β4Γ more than long-term contracts
π Churn by Tenure Band

Highest churn occurs in the first 6β12 months
π Churn by Internet Service

Fiber optic users show elevated churn β expectation gap
π₯ Phase 2 β Behavioral Segmentation
Move from βall customersβ to decision-ready personas
π Elbow Method for K-Means

Four stable, interpretable segments selected
π₯ Customer Segmentation β K-Means (PCA Projection)

Customers cluster into four distinct behavioral personas based on tenure, spend, and service usage.
π― Segment Centers β K-Means with Centroids

Centroids represent the behavioral βcenterβ of each segment, enabling stable personas and consistent downstream decision-making.
π Segment Profiles

| Segment | Churn | Value | Business Meaning |
|---|---|---|---|
| High value + Low churn | Low | Very high | Core revenue base |
| High value + High churn | High | Moderate | Revenue at risk |
| Low value + High churn | High | Low | Poor ROI |
| Low value + Low churn | Low | Low | Stable, low margin |
π€ Phase 3 β Churn Modeling
Predict churn with explainability
π Global Model Performance

- ROC-AUC β 0.85
- Accuracy β 80%
- Conservative by design β minimizes wasted retention spend
π Top Churn Drivers

Increases churn
- Fiber optic service
- Month-to-month contracts
- Early tenure
- Lack of tech support
Reduces churn
- Two-year contracts
- Longer tenure
- Lower monthly charges
π The model explains what to fix, not just who might leave.
π― Phase 4 β Decision Layer
From prediction β action
π Segment Strategy Matrix

| Segment Type | Action |
|---|---|
| High value + High churn | Aggressive retention |
| High value + Low churn | Loyalty rewards |
| Low value + High churn | Minimal spend |
| Low value + Low churn | Maintain |
π Customer-Level Retention Priority

Each customer receives:
- churn probability (segment-aware)
- value proxy
- actionability score
- ranked retention priority
- recommended action
This output is CRM-ready.
π§ Impact
This system enables leadership to reduce churn while protecting margin, by acting only where ROI is positive.
What this project demonstrates:
- Strategic thinking (not just ML)
- Decision intelligence
- Explainable AI
-
Production-ready analytics
π§ Why This Matters
Most churn projects stop at βwho might churn.β This project answers βwho should we act on, and why.β
That distinction is what separates academic ML from production data science.
π Final Note
This project mirrors how churn analytics is built and deployed in real organizations β combining modeling, segmentation, and business decision-making into one system.
π§ Project 1 β Physics-Informed Neural Networks (PINNs)
Heat Transfer Modeling in Chemical Reactors
π Overview
This project implements a Physics-Informed Neural Network (PINN) to model transient heat diffusion in a chemical reactor using first-principles physics embedded into a neural network.
Rather than relying purely on data, the model enforces the 1D heat equation during training, enabling physically consistent predictions even with sparse or noisy measurementsβa key requirement for engineering systems and digital twins.
π― Objectives
-
Model transient heat diffusion using a neural network
-
Enforce physical consistency via PDE residual minimization
-
Validate predictions against a closed-form analytical solution
-
Produce engineering-ready outputs (profiles, error maps, metrics)
π Governing Physics (Engineering Form)
Heat diffusion (1D, transient):
βT/βt = Ξ± Β· βΒ²T/βxΒ²
Where:
T(x,t)= temperatureΞ± = k / (Ο Β· cβ)= thermal diffusivity
Boundary conditions:
T(0,t) = 0
T(1,t) = 0
Initial condition:
T(x,0) = sin(Οx)
Analytical validation solution:
T(x,t) = exp(-Ξ±Β·ΟΒ²Β·t) Β· sin(Οx)
π§ Methods & Model Design
- Model: Multi-Layer Perceptron (MLP) mapping
(x, t) β T(x,t) - Training strategy: physics collocation sampling across spaceβtime
- Physics enforcement: automatic differentiation
-
Loss components:
- PDE residual loss (physics enforcement)
- Boundary condition loss
- Initial condition loss
- (Optional) sparse sensor data loss
π Visual Results
π Training Loss Convergence (Total, PDE, BC, IC)
Shows stable convergence of the physics-constrained loss components.

-
All loss components (PDE, Boundary, Initial) decrease by several orders of magnitude
-
The total loss converges to ~10β»β΄
-
Periodic spikes appear in the loss curves
π§ Interpretation
-
The PDE loss decreasing confirms the network is learning a function that satisfies the heat equation
-
The BC and IC losses dropping faster indicates that boundary and initial constraints are easier to satisfy than interior physics
The periodic spikes are normal in PINNs and occur because:
-
The optimizer alternates between satisfying different competing constraints
-
Physics, BCs, and ICs pull the solution in slightly different directions
-
Overall downward trend β stable, convergent training
π¬ Physical meaning
The network is not just fitting data β it is learning a temperature field that obeys energy conservation throughout the domain.
βοΈ This confirms successful physics enforcement, not just numerical curve fitting.
π‘οΈ Temperature Profiles β PINN vs Analytical
Comparison across multiple time slices validates physical accuracy.

- PINN predictions (solid lines) overlap almost perfectly with analytical solutions (dashed)
Agreement holds across all time slices:
-
π‘=0.00
-
t=0.25
-
t=0.50
-
t=0.75
-
t=1.00
π§ Interpretation
The PINN accurately captures:
-
Spatial shape (sinusoidal mode)
-
Temporal decay (exponential damping)
-
No phase shift, no amplitude drift β even at later times
π¬ Physical meaning
The model has learned:
-
The dominant eigenmode of the heat equation
-
The correct decay rate governed by thermal diffusivity πΌ
This shows the PINN has internalized the governing physics, not memorized discrete points.
βοΈ This level of overlap is equivalent to a high-resolution numerical solver.
π₯ Absolute Error Heatmap (SpaceβTime)
Highlights regions of higher error and overall solution fidelity.

-
Errors are uniformly low across most of the domain
-
Slightly higher errors near:
-
Early time π‘β0
-
Boundaries xβ0 and xβ1
-
Maximum error β 6.6 Γ 10β»Β³
π§ Interpretation
Higher error near t=0 is common because:
- The solution transitions sharply from the initial condition
Boundary regions are more sensitive due to:
- Competing enforcement of BCs and PDE constraints
Importantly:
-
No error blow-up
-
No instability over time
π¬ Physical meaning
The PINN provides a globally consistent thermal field, suitable for:
-
Design analysis
-
Optimization
-
Digital twin deployment
The smooth error structure indicates numerical stability, not overfitting.
π Quantitative Summary (From Metrics)
-
RMSE β 1.2 Γ 10β»Β³
-
Max absolute error β 6.6 Γ 10β»Β³
Interpretation
-
Errors are <1% of peak temperature
-
Comparable to (or better than) coarse CFD / FDM grids
-
Achieved without mesh generation
βοΈ This validates PINNs as a credible alternative to traditional solvers.
π§ Big-Picture Insight
This experiment shows that the PINN:
-
Learns physical laws, not just data
-
Generalizes across the entire spaceβtime domain
-
Produces smooth, physically meaningful solutions
-
Remains stable over long time horizons
In other words:
- We have built a working scientific machine learning solver.
π Final Insight
A Physics-Informed Neural Network was trained to solve the 1D transient heat equation, achieving sub-percent error and near-perfect agreement with analytical solutions across the full spaceβtime domain.
π Key Outputs
The project generates:
- Loss convergence curves
- PINN vs analytical temperature profiles
- Absolute error heatmaps
- Quantitative metrics (RMSE, max error)
πΌ Applications
- Chemical Engineering: hot-spot detection, reactor thermal safety
- Industrial AI: physics-guided surrogate modeling
- Data Centers: thermal digital twins, cooling optimization (CRAC/CRAH, liquid loops)
- Energy & Aerospace: fast thermal solvers for what-if analysis
π Tools & Technologies
- Python
- PyTorch (automatic differentiation)
- NumPy
- Matplotlib
π Project Files
- Physics-Informed Neural Networks (PINNs).ipynb
train.pyβ training, evaluation, and plottingpinn.pyβ PINN architecture and PDE residualutils.pyβ analytical solution and helpers
π Future Extensions
- 2D / 3D geometries
- Convection and reaction heat generation
- Inverse problems (estimate
Ξ±or boundary heat flux) - Reinforcement-learning-based control
- Streamlit or cloud-based dashboards
π Project 2 β Dynamic Temperature & Velocity Analysis in Engineering Systems
π Overview
This project applies data science, exploratory data analysis (EDA), and predictive modeling to analyze thermal and fluid dynamic behavior in two critical engineering systems:
- Heat Exchangers
- Chemical Reactors
By combining physics-based simulation, statistical summaries, and predictive trend analysis, the project demonstrates how data science can be used to monitor system stability, detect deviations, and support operational decision-making in industrial environments.
π― Objectives
- Model temperature distribution across time and space
- Analyze dynamic temperature control in a reactor
- Perform EDA across multiple physical systems
- Compare observed vs predicted velocity profiles
- Evaluate prediction accuracy for reactor temperature
π§ͺ System 1 β Heat Exchanger Temperature Distribution
This visualization shows how temperature evolves over time and distance inside a heat exchanger.

Key Insights
- Temperature decays with both time and axial distance
- Highlights thermal efficiency loss
- Useful for detecting ineffective heat transfer zones
π₯ System 2 β Dynamic Temperature Control in a Chemical Reactor
A spatio-temporal view of temperature regulation inside a reactor.

Key Insights
- Oscillatory behavior indicates active control dynamics
- Spatial damping shows heat dissipation stability
- Applicable to process control and safety monitoring
π Exploratory Data Analysis (EDA) Across Systems
Summary statistics across three subsystems:
- Reactor Temperature
- Pipe Velocity
- Heat Exchanger Temperature

Metrics Analyzed
- Minimum
- Maximum
- Mean
- Standard Deviation
Why It Matters
- Identifies variability and risk
- Supports threshold setting
- Enables cross-system comparison
π° Pipe Flow Analysis β Velocity vs Radius
Observed vs predicted velocity distribution along pipe radius.

Key Insights
- Near-parabolic profile consistent with laminar flow
- Prediction deviation near pipe wall highlights model limitations
-
Useful for hydraulic efficiency validation
π€ Predictive Modeling β Reactor Mid-Position Temperature
Comparison between observed and predicted reactor temperature over time.

Key Insights
- Model captures trend but smooths oscillations
- Demonstrates biasβvariance tradeoff
- Foundation for advanced ML control models
π§ Data Science Techniques Used
- Exploratory Data Analysis (EDA)
- Time-series trend analysis
- Physics-informed simulation
- Model vs observation comparison
- Visualization with Matplotlib & NumPy
π Tools & Technologies
- Python
- NumPy
- Matplotlib
- Jupyter Notebook
- Statistical Analysis
- Engineering Modeling
π Notebook:
Heat Exchangers vs. Reactors β The Role of Dynamic Temperature & Fluid Velocity Profiles.ipynb
π Applications
- Process Engineering
- Chemical & Thermal Systems
- Manufacturing Optimization
- Predictive Maintenance
- Data-Driven Engineering Decisions
β Why This Project Matters
This project demonstrates how data science bridges theory and real-world engineering systems, enabling:
- Better control
- Higher efficiency
- Safer operations
- Smarter decision-making
π Project 3: Photolithography Yield Risk Prediction
AI-Driven Pass/Fail Modeling for Semiconductor Manufacturing
Project Type: Industrial Data Science Β· Manufacturing AI Β· Explainable ML
Dataset: SECOM Semiconductor Manufacturing Dataset (UCI ML Repository)
π§° Tools & Technologies
- Programming: Python
- Libraries: Scikit-Learn, Pandas, NumPy, Matplotlib
- Methods: Classification, Ensemble Learning, Explainable AI, Drift Monitoring
π¬ Project Overview
Modern semiconductor manufacturingβespecially photolithographyβoperates under extremely tight process windows. Small deviations in exposure, focus, thermal stability, or tool health can lead to critical dimension (CD) or overlay excursions, resulting in yield loss.
This project develops an AI-driven pass/fail risk prediction system using real semiconductor process sensor data.
The objective is to identify yield risk before downstream metrology, enabling:
- Earlier intervention
- Higher throughput
- Improved fab stability
π― Business & Engineering Objective
Problem Statement
Can we predict whether a manufacturing run will PASS or FAIL specification using high-dimensional process sensor dataβbefore final inspection?
Why This Matters
- AI chips require near-perfect yield
- Lithography tools are capital-intensive bottlenecks
- Early risk detection reduces:
- Scrap
- Rework
- Tool downtime
- Throughput loss
π§ Dataset Description
Source: UCI Machine Learning Repository β SECOM Dataset
| Attribute | Value |
|---|---|
| Samples | 1,567 manufacturing runs |
| Sensors | 590 process variables |
| Target | Pass / Fail |
Key Characteristics
- High dimensionality (p β« n)
- Severe class imbalance (~93% FAIL, ~7% PASS)
- Structured missing data (conditional sensors)
- Strong subsystem correlations
β This makes the dataset highly realistic for semiconductor manufacturing analytics.
π Data Science Lifecycle (Photolithography Context)
1οΈβ£ Problem Definition
- Predict yield risk (PASS/FAIL) prior to metrology
- Analogous to CD or overlay out-of-spec prediction
2οΈβ£ Data Collection
Process telemetry representing:
- Exposure & focus proxies
- Thermal and environmental signals
- Tool subsystem health
3οΈβ£ Data Understanding
- Sensor completeness analysis
- Missingness patterns
- Variability and correlation checks
4οΈβ£ Data Cleaning & Wrangling
- Median imputation for missing values
- Retention of conditionally active sensors
- Stratified train/test split
5οΈβ£ Exploratory Data Analysis (EDA)



EDA highlights:
- Severe class imbalance
- Mostly complete core sensors with conditional diagnostics
- Strong subsystem-level correlations
6οΈβ£ Feature Engineering
- Robust scaling
- Imputation pipelines
- Preparation for nonlinear models
7οΈβ£ Modeling
- Logistic Regression β baseline, interpretable
- Random Forest β nonlinear, subsystem-aware
8οΈβ£ Model Evaluation

| Model | ROC-AUC |
|---|---|
| Logistic Regression | ~0.64 |
| Random Forest | ~0.78 |
Interpretation
- Yield risk is not linearly separable
- Nonlinear interactions between tool subsystems dominate
- Ensemble models better capture lithography behavior
9οΈβ£ Operational Insight: Confusion Matrix

At default thresholds, the Random Forest behaves conservatively, flagging nearly all runs as FAIL.
Prediction β Decision
Threshold tuning is essential to balance yield protection vs throughput.
π Explainability & Root-Cause Insight

Key observations:
- Only ~10β20 sensors dominate model decisions
- Reflect real lithography subsystems:
- Illumination stability
- Focus control
- Thermal regulation
- Stage dynamics
π‘ Deployment & Drift Monitoring

- Drift monitored using Population Stability Index (PSI)
- Early-stage drift detected in key sensors
- No catastrophic shifts, but signals of:
- Tool aging
- Process change
- Recipe evolution
This mirrors how fabs monitor equipment health in production.
π Why This Project Stands Out
β Real semiconductor manufacturing data
β High-dimensional, imbalanced industrial ML problem
β Strong focus on explainability and deployment readiness
β Direct relevance to AI chip production and advanced nodes
π Future Enhancements
- Threshold optimization for fab decision policies
- Precision-Recall analysis
- Risk banding (Green / Yellow / Red)
- SHAP-based local explanations
- Dashboard integration (Power BI / Plotly Dash)
π Project Files
π Photolithography_Project.ipynb
Project 4: Reliability Analysis & Survival Modeling
KaplanβMeier, Hazard Functions, and Batch Comparison
π Project Overview
This project focuses on reliability engineering and time-to-event analysis using survival analysis techniques. The objective is to model failure behavior over time, quantify survival probabilities, and compare reliability performance across manufacturing batches.
The analysis applies industry-standard statistical methods widely used in manufacturing, aerospace, defense, and semiconductor reliability studies.
π― Objectives
-
Model time-to-failure behavior using KaplanβMeier survival estimation
-
Distinguish failure vs. censored observations
-
Compare survival performance between Batch A and Batch B
-
Analyze hazard (failure) rates over time
-
Support data-driven reliability and quality decisions
π§ Methods & Techniques
-
KaplanβMeier Estimator
-
Censoring analysis
-
Survival curve comparison by group
-
Hazard function estimation
-
Exploratory distribution analysis
-
Confidence interval visualization
π Key Visualizations & Insights
1οΈβ£ KaplanβMeier Survival Curve
This plot estimates the probability that a unit survives beyond a given time.
Steep early decline indicates early-life failures
Gradual tail suggests wear-out behavior
Confidence bands show estimation uncertainty over time

2οΈβ£ Time-to-Event Distribution (Failure vs. Censored)
This visualization contrasts observed failures against censored observations.
Failures dominate early time periods
Censored observations increase at later times
Confirms the need for survival modeling vs. simple averages

3οΈβ£ Survival Probability Comparison (Batch A vs. Batch B)
This comparison highlights reliability differences between manufacturing batches.
Batch A demonstrates consistently higher survival probability
Batch B experiences earlier degradation
Confidence intervals reflect statistical uncertainty

4οΈβ£ Smoothed Survival Trends
Smoothed curves help reveal underlying reliability trends.
Batch A shows delayed failure onset
Batch B exhibits faster reliability decay
Useful for management-level interpretation

5οΈβ£ Hazard Function Analysis
The hazard function represents the instantaneous failure rate.
Increasing hazard rate indicates aging or wear-out failure mode
Critical for maintenance planning and lifecycle decisions

π Business & Engineering Impact
-
Enables predictive maintenance strategies
-
Supports supplier and batch qualification
-
Reduces unexpected failures and downtime
-
Improves manufacturing reliability and quality control
-
Applicable to aerospace, defense, electronics, and semiconductor systems
π Tools & Technologies
-
Python
-
NumPy, Pandas
-
Matplotlib / Seaborn
-
Lifelines (Survival Analysis)
-
Jupyter Notebook
π Notebook: Reliability Analysis Using Weibull Modeling.ipynb
π Key Takeaway
Survival analysis provides a statistically robust framework to evaluate reliability, account for censored data, and compare manufacturing performance across batchesβfar beyond traditional MTBF metrics.
π Project 5: OathβOutcome Alignment Analysis
From Constitutional Promises to Measurable Outcomes
π Project Overview
This project applies data science, statistical modeling, and natural language processing (NLP) to evaluate whether real-world institutional outcomes align with the constitutional obligations defined in official government oaths.
Public institutions in the United Statesβmilitary, law enforcement, judiciary, and civil governmentβderive their authority from oaths sworn to the U.S. Constitution. While these oaths establish clear legal and ethical obligations, there is limited quantitative research measuring how closely institutional behavior aligns with those commitments.
This project addresses that gap by converting normative legal principles into measurable signals and comparing them against observed institutional outcomes.
π― Research Question
Do institutional outcomes align with the constitutional obligations defined in official oaths?
π§ Why This Matters
- Converts normative constitutional law into quantifiable metrics
- Bridges law, ethics, governance, and data science
- Moves beyond anecdotal accountability toward evidence-based oversight
- Rarely studied quantitatively in academic or policy literature
Relevant to:
- Oversight bodies
- Inspectors General
- Civil rights organizations
- Policy analysts
- Academic researchers
π Key Visualizations
π΅ Oath vs Outcome Radar Chart (Law Enforcement Example)
π΅ Oath vs Outcome Radar Chart

Interpretation
- Large gaps between oath commitments and outcomes indicate institutional misalignment
- Collapsed outcome area signals enforcement or accountability failure
- Symmetry would indicate constitutional compliance
π Distribution of OOAS Across Agencies

Insight
- Left-skewed distributions highlight systemic negative alignment
- Outliers identify institutions requiring immediate oversight attention
π₯ OOAS Heatmap by Agency & State

Insight
- Enables cross-jurisdictional comparison
- Reveals geographic and institutional accountability disparities
π¬ Methodology
Text Analysis (NLP)
- Oath language extraction
- Constraint density and clarity scoring
Feature Engineering
- Accountability strength
- Powerβconstraint ratios
Statistical Modeling
- Regression analysis
- Institutional comparison
Visualization
- Radar charts
- Heatmaps
-
Trend analysis
π Project 6: Data Center Insights with Data Science & Engineering
Operational Intelligence, Reliability, and Performance Optimization
π Project Overview
This project applies data science methods grounded in engineering principles to analyze and interpret data center operational behavior, focusing on thermal stability, energy consumption, and communication efficiency.
Modern data centers function as tightly coupled cyber-physical systems. Small deviations in temperature, power usage, or communication latency can propagate into equipment stress, efficiency loss, or reliability risk. This project demonstrates how engineering-aware analytics can support proactive monitoring and decision-making.
π― Analytical Objectives
- Monitor and interpret thermal system behavior
- Evaluate power consumption patterns over operational cycles
- Compare communication latency across physical transmission media
- Translate engineering signals into data-driven operational insights
π Key Visual Analyses
π‘οΈ Reactor / Equipment Temperature Monitoring (24-Hour Cycle)

Insight
- Shows diurnal temperature variation and peak thermal loading
- Dashed threshold highlights risk zones requiring alerts or control actions
- Demonstrates how time-series monitoring supports preventive intervention
- Directly applicable to thermal management of racks, cooling loops, and DAHUs
β‘ Power Consumption Patterns in a Data Center

Insight
- Captures cyclical load behavior across a 24-hour operational window
- Peak demand periods correlate with increased cooling and compute activity
-
Supports:
- Energy efficiency optimization
- Capacity planning
- PUE-oriented performance analysis
π Communication Latency: Optical vs Electrical Transmission

Insight
- Quantifies latency growth as a function of distance
- Demonstrates superior scalability of optical fiber for low-latency environments
- Reinforces engineering trade-offs in network design for data centers
-
Relevant to:
- High-performance computing
- Low-latency cloud services
- Backbone infrastructure planning
π§ Engineering + Data Science Integration
This project explicitly connects:
- Physical system behavior (temperature, power, signal propagation)
- Data science tools (EDA, visualization, trend analysis)
- Engineering constraints (thresholds, efficiency limits, reliability curves)
Rather than treating data as abstract, each variable is interpreted within its physical and operational context.
π§ͺ Deliverables
- π Jupyter Notebook with reproducible analysis
- π Engineering-driven visual analytics
- π§ Operational insights for infrastructure optimization
- π Documentation linking analytics to real data center systems
π Future Extensions
- Predictive maintenance models (thermal & electrical)
- Time-series forecasting of energy demand
- Anomaly detection for early fault identification
- Integration with real sensor telemetry (IoT / BMS / EPMS)
π Project Files
π The Data Center insights with Data science and engineering (1).ipynb
π‘ Project 7: Wi-Fi Optimization & Communication Performance Analysis
Signal Quality, Reliability, and Network Efficiency
π Project Overview
This project applies data science, signal processing concepts, and network engineering principles to analyze wireless communication performance, with a focus on signal reliability, coverage quality, and user-level optimization.
Wireless networks are fundamental to modern digital infrastructure, yet their performance is constrained by noise, interference, distance, and infrastructure placement. This project demonstrates how engineering-informed analytics can be used to evaluate and optimize Wi-Fi performance using quantitative signal metrics.
π― Analytical Objectives
- Quantify the relationship between Signal-to-Noise Ratio (SNR) and Bit Error Rate (BER)
- Analyze spatial Wi-Fi coverage quality across a service area
- Evaluate user-level throughput optimization under SNR constraints
- Support data-driven decisions for access-point placement and network tuning
π Key Visual Analyses
π Bit Error Rate vs Signal-to-Noise Ratio (QPSK)

Insight
- Demonstrates the exponential reduction in bit errors as SNR increases
- Highlights the reliability threshold required for stable digital communication
- Reinforces theoretical expectations from digital modulation and communication theory
- Relevant to network design, error control, and quality-of-service planning
πΊοΈ Wi-Fi Coverage Map: Average SNR by Region

Insight
- Visualizes spatial variation in signal quality
- Identifies low-SNR regions requiring infrastructure improvement
- Supports access-point optimization and coverage gap detection
- Applicable to enterprise networks, campuses, and data center environments
π User Optimization: SNR vs Throughput

Insight
- Shows throughput sensitivity to SNR degradation
- Threshold lines highlight performance drop-off zones
- Enables classification of low-quality user experiences
- Supports intelligent AP assignment and load balancing strategies
π§ Engineering & Data Science Integration
This project integrates:
- Communication theory (SNR, BER, modulation efficiency)
- Statistical visualization and analysis
- Network performance engineering
- Optimization logic grounded in real-world constraints
Each result is interpreted in the context of physical signal behavior and network performance limits.
π§ͺ Deliverables
- π Jupyter Notebook with reproducible simulations
- π Communication performance visualizations
- π‘ Network optimization insights
- π Engineering-aware documentation
π Future Enhancements
- Adaptive modulation and coding analysis
- Machine-learning-based AP selection
- Time-varying interference modeling
- Integration with real Wi-Fi telemetry data
π Project Files
π Wifi optimization (1).ipynb
π€ Author
Jemael Nzihou
PhD Student β Data Science
Chemical Engineer | Business Analytics | Quality Champion certified
π Portfolio: https://jemaelnzihou.github.io/Jemael-Nzihou-Portfolio/
π LinkedIn: https://www.linkedin.com/in/jemaelnzihou
π License
This project is released for research and educational use. Please cite appropriately if used in academic or policy work.