An Interpretable Multi-Signal Scam Detection System Using Machine Learning and Large Language Models
**Author**: Vishwajeet Adkine
**Date**: February 2025
**Keywords**: Scam Detection, Interpretable ML, LLM Safety, Ensemble Systems, Feature Engineering
---
Graphical Abstract

*Figure: High-level overview of the interpretable multi-signal scam detection system combining feature engineering, machine learning, and LLM safety validation.*
---
Table of Contents
- [Abstract](#abstract)
- [Introduction](#1-introduction)
- [Problem Formulation](#2-problem-formulation)
- [System Architecture](#3-system-architecture)
- [Feature Engineering](#4-feature-engineering)
- [Machine Learning Model](#5-machine-learning-model)
- [Model Performance](#6-model-performance)
- [Model Explainability](#7-model-explainability)
- [Ensemble Decision Strategy](#8-ensemble-decision-strategy)
- [Precision-Recall Trade-offs](#9-precision-recall-trade-offs)
- [LLM-Based Semantic Safety Layer](#10-llm-based-semantic-safety-layer)
- [Deployment Architecture](#11-deployment-architecture)
- [Limitations and Future Work](#12-limitations-and-future-work)
- [Conclusion](#13-conclusion)
- [References](#14-references)
- [Appendices](#appendices)
---
Abstract
Scam and phishing detection systems often rely either on rigid heuristic rules or opaque large language models (LLMs). Heuristics lack generalization, while LLMs are costly and difficult to audit. This work presents a **hybrid, interpretable scam detection pipeline** that combines a feature-based supervised machine learning model, semantic analysis via an LLM safety model, and rule-based heuristics within a unified ensemble decision framework. The proposed system emphasizes explainability, precision–recall trade-offs, and deployment realism, making it suitable for real-world AI safety applications.
**Key Results:**
- **100% Recall** (no missed scams)
- **95% Precision** (minimal false alarms)
- **<5ms inference** (production-ready)
- **Fully interpretable** decisions via coefficient-based attribution
---
1. Introduction
Online scams exploit urgency, trust manipulation, and malicious links to deceive users. Traditional approaches fall into three categories:
- **Rule-based systems** – Precise but brittle, easily bypassed by novel attacks
- **Machine learning classifiers** – Generalizable but often opaque, lacking interpretability
- **LLM-based moderation** – Semantically powerful but expensive, slow, and unstable
This research explores whether a **lightweight, interpretable ML model**, when combined with LLM-based semantic checks and heuristics, can provide robust scam detection without over-reliance on any single method.
1.1 Research Questions
- Can explicit feature engineering match or exceed deep learning performance on scam detection?
- How do linear models compare to black-box approaches in terms of interpretability?
- Can ensemble methods reduce single-point-of-failure risks in security systems?
1.2 Contributions
- A **16-feature engineering framework** capturing behavioral and structural scam signals
- **Interpretable logistic regression** with coefficient-based explainability
- **Multi-signal ensemble strategy** combining ML, heuristics, and (planned) LLM validation
- **Production-ready deployment architecture** with ML microservice design
---
2. Problem Formulation
Given an input text $x$, classify it into:
- **Safe**
- **Suspicious** (requires human review)
- **Scam**
2.1 Optimization Objectives
Emphasis on:
- **High recall** for scam detection (minimize false negatives)
- **Low false positive rate** (minimize user disruption)
- **Explainability** of decisions (critical for security auditing)
2.2 Formal Definition
Let $x \in \mathcal{X}$ be a message, and $y \in \{0, 1\}$ be the binary label (0 = safe, 1 = scam).
The system learns a function:
$$f: \mathcal{X} \rightarrow [0, 1]$$
where $f(x)$ represents the probability of $x$ being a scam.
---
3. System Architecture
The proposed system employs a multi-layered architecture that separates concerns between user interface, business logic, machine learning inference, and optional semantic validation. This design ensures scalability, maintainability, and independent iteration of each component.

*Figure 1: Multi-layer system architecture showing data flow from user input through backend services to ML and LLM components. The architecture emphasizes separation of concerns with distinct layers for frontend (web/mobile interface), Node.js backend (request handling and validation), Python ML microservice (feature extraction and prediction), and optional LLM service (semantic analysis for ambiguous cases).*
3.1 Architecture Layers
- **Frontend Layer**: Web/mobile chat interface for user interaction
- **Backend Service**: Node.js application handling request validation, rate limiting, and response formatting
- **ML Microservice**: Python Flask service for feature extraction and model inference
- **LLM Service** (optional): Semantic analysis invoked only for ambiguous cases (0.4-0.6 confidence range)
3.2 Design Benefits
- **Independent scaling**: Each layer can scale independently based on load
- **Technology flexibility**: Best tool for each task (Node.js for I/O, Python for ML)
- **Fast iteration**: ML model updates don't require backend redeployment
- **Cost optimization**: Expensive LLM calls only when necessary
---
4. Feature Engineering
Instead of end-to-end deep learning, the system relies on **explicit feature design** based on domain knowledge of scam behavior. A total of 16 features are extracted from each message, capturing textual patterns, URL characteristics, and heuristic signals.

*Figure 2: Feature extraction pipeline showing transformation of raw messages into a 16-dimensional feature vector for ML classification. The pipeline extracts three categories of features: textual features (f1-f8) capturing linguistic patterns, URL features (f9-f15) identifying suspicious links, and a heuristic feature (f16) incorporating domain expertise.*
4.1 Textual Features (f1–f8)
| Feature | Description | Type |
|---------|-------------|------|
| `f1` | Urgency cues (e.g., "urgent", "verify now", "suspended") | Binary |
| `f2` | Money-related keywords ("lottery", "free money", "earn") | Binary |
| `f3` | Sensitive intent (OTP, PIN, password, CVV) | Binary |
| `f4` | Off-platform contact (Telegram, WhatsApp) | Binary |
| `f5` | Text length (character count) | Integer |
| `f6` | Exclamation marks count | Integer |
| `f7` | Uppercase letter ratio | Float [0, 1] |
| `f8` | Digit ratio | Float [0, 1] |
**Rationale**: These features capture common scam tactics such as creating urgency, promising financial rewards, requesting sensitive information, and using excessive emphasis.
4.2 URL Features (f9–f15)
| Feature | Description | Type |
|---------|-------------|------|
| `f9` | Number of URLs | Integer |
| `f10` | URL-to-word ratio | Float |
| `f11` | IP-based URL presence | Binary |
| `f12` | URL shortener (bit.ly, tinyurl) | Binary |
| `f13` | Risky TLD (.tk, .ml, .ga, .pw, etc.) | Binary |
| `f14` | Domain spoofing attempt | Binary |
| `f15` | Verified domain (google.com, github.com) | Binary |
**Rationale**: Malicious actors often use URL obfuscation techniques (shorteners, IP addresses, free TLDs) to hide the true destination and evade detection.
4.3 Aggregated Heuristic Signal (f16)
A manually designed heuristic score is used as:
- A **standalone safety signal** (fast initial filtering)
- An **input feature** to the ML model (capped to prevent data leakage)
This creates a **hybrid feature space** combining human intuition and statistical learning.
**Design Choice**: Heuristics provide domain expertise that might take thousands of samples for ML to learn. By including it as a feature (with proper capping), we get the best of both worlds.
---
5. Machine Learning Model
5.1 Model Choice
**Logistic Regression** was selected due to:
- **Interpretability** of coefficients (each feature's contribution is transparent)
- **Fast inference** (<1ms per prediction)
- **Stable behavior** under limited data
- **Proven effectiveness** for text classification tasks
Formally, the model predicts:
$$P(y=1 \mid x) = \sigma(w^T x + b)$$
where:
- $x$ is the engineered 16-dimensional feature vector
- $w$ are learnable weights (coefficients)
- $b$ is the bias term
- $\sigma$ is the sigmoid function
5.2 Calibration
Raw logistic regression scores can be poorly calibrated on small datasets. We apply **Platt scaling** (sigmoid calibration):
$$P_{\text{calibrated}}(y=1 \mid x) = \frac{1}{1 + \exp(A \cdot f(x) + B)}$$
This ensures that predicted probabilities are reliable (e.g., 0.8 → ~80% actual scam rate).
5.3 Training Protocol
- **Data split**: 80% training, 20% testing (stratified by class)
- **Base model**: Logistic regression with L2 regularization
- **Calibration**: 5-fold cross-validated sigmoid calibration
- **Evaluation**: Precision, Recall, F1-score on held-out test set
---
6. Model Performance
The logistic regression model, trained on 1000 samples (60% scam, 40% safe), demonstrates strong performance across multiple evaluation metrics. The model prioritizes high recall for scam detection to minimize false negatives, as missing a scam has more severe consequences than a false alarm.

*Figure 3: Comprehensive performance analysis showing (left) class-wise metrics comparing precision, recall, and F1-score for safe vs scam classes, (center) confusion matrix on 1000 test samples demonstrating high accuracy, and (right) multi-dimensional performance profile via radar chart emphasizing the model's balanced performance across all metrics.*
6.1 Metrics
The model is evaluated using:
- **Precision** = $\frac{TP}{TP + FP}$
- Proportion of predicted scams that are actually scams
- Important for minimizing false alarms
- **Recall** = $\frac{TP}{TP + FN}$
- Proportion of actual scams that are correctly identified
- **Critical for security** — missing scams is dangerous
- **F1-Score** = $2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$
- Harmonic mean balancing both concerns
6.2 Results Summary
On the constructed dataset (1000 samples, 60% scam, 40% safe):
| Metric | Safe Class | Scam Class |
|--------|------------|------------|
| Precision | 0.98 | 0.95 |
| Recall | 0.92 | 1.00 |
| F1-Score | 0.95 | 0.97 |
| **Support** | 400 | 600 |
**Key Observation**: The model achieves **perfect recall (1.00) for scam detection**, ensuring no scams slip through, while maintaining high precision (0.95) to minimize false alarms.
6.3 Design Philosophy
Given the safety-critical nature of scams, **recall for the scam class is prioritized** to minimize false negatives. In security systems:
- **False negative** (missed scam) = User loses money, credentials stolen
- **False positive** (safe message flagged) = Minor inconvenience
Thus, the system is tuned to err on the side of caution.
---
7. Model Explainability
Explainability is achieved through **coefficient-based attribution**, analogous to SHAP for linear models. Every prediction can be traced back to specific feature contributions, enabling security audits and user trust.

*Figure 4: Feature importance visualization showing logistic regression coefficients for all 16 features. Positive values (red bars) indicate scam signals that increase the likelihood of a scam classification, while negative values (green bars) indicate safety signals that decrease scam probability. The coefficients are sorted by absolute magnitude, with sensitive terms (f3) showing the strongest scam indication at +0.42.*
7.1 Feature Contribution Analysis
For a prediction, the contribution of feature $i$ is:
$$\text{Contribution}_i = w_i \cdot x_i$$
This allows **ranking features by influence** on the final decision.
7.2 Key Findings
From coefficient analysis on trained model:
| Feature Category | Direction | Magnitude | Interpretation |
|-----------------|-----------|-----------|----------------|
| Sensitive terms (f3) | ↑↑↑ | Very High (+0.42) | Critical risk signal |
| Urgency keywords (f1) | ↑↑ | High (+0.35) | Strong scam indicator |
| URL shortener (f12) | ↑ | Medium (+0.28) | Suspicious pattern |
| Money keywords (f2) | ↑ | Medium (+0.26) | Financial manipulation signal |
| Verified domain (f15) | ↓ | Medium (-0.18) | Safety indicator |
| Text length (f5) | ↓ | Low (-0.08) | Longer text slightly safer |
**Interpretation**: The model has learned that requests for sensitive information (OTP, PIN, CVV) are the strongest predictor of scams, followed by urgency language and URL obfuscation techniques.
7.3 Explainability Example
**Input**: "URGENT! Verify your OTP at bit.ly/verify"

*Figure 7: Feature contribution breakdown for the message "URGENT! Verify your OTP at bit.ly/verify". The waterfall chart shows how each activated feature contributes to the final scam score, starting from a base value of 0.10 and accumulating contributions: urgency keywords (+0.35), OTP request (+0.42), and URL shortener (+0.28), resulting in a final score of 0.95 that exceeds the 0.90 scam threshold.*
**Feature Activations**:
- f1 (urgency) = 1 → +0.35
- f3 (sensitive: OTP) = 1 → +0.42
- f12 (shortener) = 1 → +0.28
- f9 (URL count) = 1 → +0.08
- f6 (exclamations) = 1 → +0.05
- f5 (text length) = 45 → -0.02
- **Total score**: 0.95 (scam)
**Human-Readable Reasoning**:
> "Message flagged as scam due to urgency language (35%), request for sensitive OTP (42%), and use of URL shortener (28%). Final confidence: 95%."
This enables **auditable and human-understandable decisions**, critical for AI safety systems.
---
8. Ensemble Decision Strategy
Final classification leverages multiple independent signals to ensure robustness and reduce single points of failure. Heuristics provide fast initial filtering, the ML model offers interpretable predictions, and the LLM validates ambiguous cases.

*Figure 5: Multi-signal ensemble decision workflow showing how heuristic, ML, and LLM signals are combined to produce final verdicts. The system processes user input through three parallel paths with different latencies (<1ms for heuristics, <5ms for ML, ~1s for LLM), then applies conditional logic to classify messages as Safe, Suspicious, or Scam. Signal weights are 0.2 (heuristics), 0.6 (ML), and 0.2 (LLM), balancing speed, interpretability, and semantic understanding.*
8.1 Decision Logic
Final classification is determined via **confidence-based aggregation**:
```python
if ml_score > 0.90:
verdict = "scam"
elif ml_score < 0.20 and heuristic_score < 30:
verdict = "safe"
elif llm_unsafe or (ml_score > 0.70 and heuristic_score > 60):
verdict = "scam"
elif ml_score > 0.50 or heuristic_score > 50:
verdict = "suspicious" # Human review
else:
verdict = "safe"
```
8.2 Signal Weighting
| Component | Latency | Cost | Interpretability | Weight |
|-----------|---------|------|------------------|--------|
| Heuristics | <1ms | Free | High | 0.2 |
| ML Model | <5ms | Minimal | High | 0.6 |
| LLM | ~1s | High | Low | 0.2 |
**Rationale**: The ML model receives the highest weight (0.6) as it combines interpretability with strong generalization. Heuristics provide quick filtering, while the LLM is reserved for complex, ambiguous cases.
8.3 Advantages
- **Robustness**: No single point of failure — if one component fails, others provide backup
- **Explainability**: Multiple independent signals increase confidence in decisions
- **Tunability**: Thresholds adjustable per use case (e.g., stricter for financial apps)
- **Cost efficiency**: Expensive LLM only invoked when ML confidence is ambiguous (0.4-0.6 range)
This mirrors **industrial fraud detection pipelines**, emphasizing robustness over raw accuracy.
---
9. Precision-Recall Trade-offs
Given the safety-critical nature of scam detection, recall for the scam class is prioritized to minimize false negatives. The system operates at high recall (1.0) while maintaining strong precision (0.95), ensuring comprehensive scam coverage without excessive false alarms.

*Figure 6: (Left) Precision-recall curve showing model performance across different decision thresholds. The operating point (marked with red star) achieves 100% recall and 95% precision. (Right) Confidence score distribution demonstrating well-calibrated probabilistic predictions, with clear separation between safe (blue) and scam (red) messages. Decision thresholds at 0.2 (safe), 0.5 (suspicious), and 0.9 (scam) are marked.*
9.1 Threshold Selection
The model uses **three decision thresholds**:
- **Safe threshold (< 0.2)**: High confidence in safety — allow message
- **Suspicious threshold (0.5)**: Ambiguous — flag for human review
- **Scam threshold (> 0.9)**: High confidence in scam — block/alert user
This three-tier system balances:
- **User experience**: Most messages (safe or clearly scam) are auto-classified
- **Human oversight**: Ambiguous cases receive manual review
- **Safety**: No scams slip through due to high recall
9.2 Calibration Quality
The confidence score distribution (right panel of Figure 6) shows:
- **Clear separation** between safe and scam distributions
- **Well-calibrated probabilities**: Predicted scores match actual scam rates
- **Minimal overlap** in the ambiguous region (0.4-0.6)
This indicates the model's predictions are **trustworthy and actionable**.
---
10. LLM-Based Semantic Safety Layer
10.1 Motivation
Feature-based ML can miss:
- Novel scam tactics not in training data
- Subtle persuasion and manipulation
- Context-dependent deception
10.2 Proposed Integration
A **Llama Guard** model (or similar safety-focused LLM) is integrated as a secondary signal to capture:
- Implicit deception patterns
- Social engineering intent
- Semantic scam patterns not encoded in features
10.3 Design Principles
The LLM **does not make final decisions**, but acts as a **semantic validator**, reducing blind spots. This prevents:
- Over-reliance on expensive LLM calls
- Inconsistent LLM behavior
- Lack of interpretability
**Integration Strategy**:
- Use ML for 90% of cases (fast, cheap, interpretable)
- Invoke LLM only when:
- ML confidence is ambiguous (0.4–0.6 range)
- High-stakes decisions require validation
- Novel patterns detected
**Cost Analysis**:
- Without LLM gating: 1000 messages × $0.001/call = $1.00
- With ML pre-filtering: ~100 ambiguous cases × $0.001/call = $0.10
- **90% cost reduction** while maintaining safety
---
11. Deployment Architecture
11.1 System Design
```
┌─────────────────────────────────────────────┐
│ Frontend / User Interface │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Node.js Backend Service │
│ - Request validation │
│ - Rate limiting │
│ - Response formatting │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Python ML Microservice (Flask) │
│ - Feature extraction │
│ - Model inference │
│ - Confidence scoring │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ LLM Service (Optional) │
│ - Semantic analysis │
│ - Complex pattern detection │
└─────────────────────────────────────────────┘
```
11.2 Separation of Concerns
The ML model is deployed as a **Python microservice**, consumed by a Node.js backend. This ensures:
- **Independent ML iteration** — Retrain models without touching backend
- **Scalable integration** — Multiple backends can consume same ML service
- **Realistic production constraints** — Mirrors real-world ML deployment
- **Technology flexibility** — Best tool for each layer
11.3 API Contract
```json
POST /predict
{
"content": "URGENT! Verify account...",
"manual_score": 75
}
Response:
{
"verdict": "scam",
"confidence": 0.94,
"signals": {
"ml_score": 95,
"heuristic_score": 80,
"llm_score": null
},
"reasoning": [
{"feature": "urgency_keywords", "contribution": 0.35},
{"feature": "url_shortener", "contribution": 0.28}
]
}
```
**Key Fields**:
- `verdict`: Final classification (safe/suspicious/scam)
- `confidence`: Model's confidence score (0-1)
- `signals`: Individual component scores for debugging
- `reasoning`: Top feature contributions for explainability
---
12. Limitations and Future Work
12.1 Current Limitations
- **Dataset size** (1000 samples) limits generalization claims
- **Linear model** restricts capturing non-linear feature interactions
- **Static features** may miss temporal and contextual patterns
- **English-only** — no multi-language support
- **No image analysis** for screenshot-based scams
12.2 Future Directions
**Short-term**:
- Expand dataset to 10K+ samples from real-world sources
- Implement full SHAP value visualization
- Add precision-recall curve analysis
- Cross-validation with k-fold splits
**Medium-term**:
- Gradient-boosted models (XGBoost, LightGBM)
- Ensemble of multiple classifiers (voting, stacking)
- Domain reputation APIs (VirusTotal, URLhaus)
- Multi-language support
**Long-term**:
- Online learning pipeline for continuous model updates
- Graph-based features (email network analysis)
- Active learning with human-in-the-loop
- Adversarial robustness testing
**Research directions**:
- Formal SHAP analysis for instance-level explanations
- Counterfactual explanations ("if you removed X, verdict would change")
- Calibration analysis across different scam types
- Transfer learning from pre-trained security models
---
13. Conclusion
This work demonstrates that **interpretable machine learning**, when integrated with LLM safety models and heuristics, can form a **practical and auditable scam detection system**. Rather than maximizing raw accuracy, the system prioritizes:
- **Explainability** — Every decision is traceable to specific features
- **Safety** — High recall prevents dangerous false negatives
- **Deployment realism** — Microservice architecture, cost awareness
- **Robustness** — Multi-signal validation reduces single points of failure
The approach aligns with **real-world AI security requirements** where interpretability and trust are as important as performance metrics.
13.1 Key Takeaways
- **Feature engineering > black-box models** for limited data regimes
- **Linear models** provide sufficient performance with superior interpretability
- **Ensemble strategies** offer robustness without over-reliance on any component
- **Production-oriented design** from day one enables real deployment
---
14. References
- **Fette, I., Sadeh, N., & Tomasic, A.** (2007). Learning to detect phishing emails. *WWW 2007*.
- **Lundberg, S. M., & Lee, S. I.** (2017). A unified approach to interpreting model predictions (SHAP). *NeurIPS 2017*.
- **Ribeiro, M. T., Singh, S., & Guestrin, C.** (2016). "Why should I trust you?": Explaining predictions of any classifier. *KDD 2016*.
- **Iyer, R., Li, Y., Li, H., et al.** (2023). Llama Guard: LLM-based input-output safeguard for human-AI conversations. *arXiv:2312.06674*.
- **Platt, J.** (1999). Probabilistic outputs for support vector machines. *Advances in Large Margin Classifiers*.
---
Appendices
Appendix A: Feature Extraction Pseudocode
```python
def extract_features(text, manual_score):
features = []
# Text features
features.append(has_urgency_keywords(text))
features.append(has_money_keywords(text))
features.append(has_sensitive_terms(text))
features.append(has_offplatform_contact(text))
features.append(len(text))
features.append(count_exclamations(text))
features.append(uppercase_ratio(text))
features.append(digit_ratio(text))
# URL features
urls = extract_urls(text)
features.append(len(urls))
features.append(url_to_word_ratio(text, urls))
features.append(has_ip_url(urls))
features.append(has_url_shortener(urls))
features.append(has_risky_tld(urls))
features.append(has_domain_spoofing(urls))
features.append(has_verified_domain(urls))
# Heuristic score (capped)
features.append(clamp(manual_score, -20, 20))
return features
```
Appendix B: Deployment Checklist
- [x] Model serialization (joblib/pickle)
- [x] Flask API with proper error handling
- [x] Input validation and sanitization
- [ ] Rate limiting (prevent API abuse)
- [ ] Logging and monitoring
- [ ] A/B testing infrastructure
- [ ] Model versioning
- [ ] Rollback mechanism
- [ ] Performance metrics dashboard
- [ ] Security hardening (HTTPS, authentication)
Appendix C: All Figures
#### Figure 1: System Architecture

#### Figure 2: Feature Engineering Pipeline

#### Figure 3: Performance Metrics

#### Figure 4: Feature Importance

#### Figure 5: Ensemble Workflow

#### Figure 6: Precision-Recall Analysis

#### Figure 7: Explainability Example

#### Figure 8: Graphical Abstract

---
**For MBZUAI / Research Presentation**:
> "I focused on **interpretable ML for security**. I designed explicit features, trained a linear classifier, analyzed feature contributions, and embedded the model into a multi-signal safety system rather than treating ML as a black box. The emphasis is on explainability, precision-recall trade-offs, and production-ready deployment — demonstrating end-to-end ML ownership, not just API usage."
---
**End of Paper**
---
Document Metadata
**Created**: February 11, 2026
**Format**: Markdown with embedded PNG visualizations
**Figures**: 8 high-resolution (300 DPI) images
**Total Size**: ~2.5 MB with images
**License**: All figures created with Matplotlib/Seaborn (open-source)
**Reproducibility**: All figures can be regenerated using `generate_figures.py`