Research & Case Study

AI-Driven Scam Detection using Text & Link Patterns

Author: Vishwajeet Adkine | AI Security & Reliability Published: November 04, 2025

Core Idea / Abstract

As online scams get sneakier, relying on old security rules just isn't enough. This research introduces a highly effective, multi-talented AI detective that spots scams by looking at three different factors at once, combining Natural Language Processing (NLP), WHOIS data, and the Google Safe Browsing API.

The system aims to create a faster, more reliable defense by treating both the content of a message and the structure/provenance of any embedded links as fundamentally untrusted data points that must be cross-verified.

Methodology: How the AI Works

The scam detection operates through three distinct analytical modules to build a comprehensive risk score:

1. It Reads the Message's Mind (NLP/Text Analysis)

The AI uses advanced language processing to analyze the actual text for high-risk emotional and rhetorical patterns. It specifically looks for the classic "scam voice"—words that create panic ("Your account is suspended!") or sound too good to be true ("You've won a prize!"). This module identifies psychological manipulation techniques common in phishing and scam messages.

2. It Checks the Link's ID (WHOIS & Safe Browsing)

If a message includes a URL, the AI initiates a background check using external security APIs:

WHOIS: Queries the registration database to see if the website is brand new (a big red flag for disposable scam sites).
Safe Browsing: Queries Google's real-time list to see if authorities have already labeled the site as dangerous, malicious, or a known phishing vector.

3. It Examines the Link's Face (Pattern Analysis)

This module closely inspects the URL's structure for subtle, manual phishing tricks:

Homoglyphs: Checks for characters that look like others (e.g., swapping 'l' for '1' or using Cyrillic characters to mimic Latin letters).
Subdomain Abuse: Detects attempts to hide the real domain (e.g., `paypal.login.scamsite.com`).
Excessive Encoding: Flags unusual character encoding meant to confuse automated scanners.

Conclusion

By combining these three clues—the shady language (NLP), the fishy link patterns (Pattern Analysis), and the suspicious official records (WHOIS & Safe Browsing)—the AI catches significantly more scams with fewer false alarms, providing a much stronger, faster defense against online fraud. This multi-layered, semantic-driven approach proves more robust than single-factor keyword or URL blacklisting.

Practical Use & Implementation

This system has been deployed as a core feature of the Stremini AI initiative. Specifically, a website trust score system to analyze URLs has been created and deployed using the Whois API.

You can verify this implementation by sending a query with a URL parameter to the endpoint below:

https://websitetrustscore.vishwajeetadkine705.workers.dev/

Test the Deployed Trust Score System → View Technical Implementation → ← Back to Research Overview