Smishi Blog | Retro Small Build

____  __  __ ___ ____  _   _ ___ 
/ ___||  \/  |_ _/ ___|| | | |_ _|
\___ \| |\/| || |\___ \| |_| || | 
 ___) | |  | || | ___) |  _  || | 
|____/|_|  |_|___|____/|_| |_|___|

  NE NASEDAJ — Don't Fall For It

🎣 Smishi — an SMS phishing detector for Serbian, Bosnian, Croatian & Montenegrin. Built for the Build Small Hackathon, June 2026. 🟢

🚨 [POST 1] Smishing is up 1,300% — and your filter doesn't speak Serbian

📅 13. JUN 2026. | 👤 metalalchemistspex | #problem #bhs

Smishing has surged over 1,300% in Serbia in the last three years. Messages impersonating Pošta Srbije, banks, and the traffic police land on phones daily — fake fines, fake parcels, fake prizes, all designed to make you click before you think.

Here's the problem: every major phishing detector is trained on English. Run a Serbian SMS through one and it often just... shrugs. Not because the scam isn't obvious to a human — but because the model has never seen the specific tricks that work in Serbian, Bosnian, Croatian, and Montenegrin (BHS/SBCM):

🔤 Case inflection — nagrada, nagradu, nagradi, nagradom are all the same "prize" lure, just in different grammatical cases. Five strings, one scam.
🔀 Script duality — the same message can be Cyrillic or Latin, and the two can be mixed. A Cyrillic "а" looks identical to a Latin "a" — to your eyes. Not to a keyword filter.
📭 No dataset — we looked. There wasn't a public, labeled BHS smishing dataset anywhere. So we built one. 1,529 messages, phishing + legitimate, Cyrillic + Latin.

That's the gap. Smishi is our attempt to close it.

⚙️ [POST 2] 110 million parameters walk into a phishing filter

📅 14. JUN 2026. | 👤 metalalchemistspex | #architecture #bertic

Under the hood, Smishi is an ensemble — two models plus rule-based heuristics, all running on CPU:

Model A: TF-IDF (character n-grams, 3–5) + Logistic Regression. Fast, simple, surprisingly solid baseline.
Model B: fine-tuned BERTić — 110 million parameters. Not 70B, not 8B. 110 million. Small enough to run on a laptop, fine-tuned specifically for BHS phishing. 96.96% accuracy, 96.3% F1.
Heuristics: suspicious/typosquatted domains, message length, urgency-language flags.

Both models' predictions and confidence scores show up side by side in the UI, along with which red flags fired. Not just "phishing: yes" — "phishing: yes, because fake URL + urgency keyword + impersonation pattern."

A favorite test case — a message reading "MUP: Sаobraćajni prekršaj evidentiran..." — has a Cyrillic "а" hiding inside otherwise-Latin text. Same pixels, different Unicode code point (U+0430 vs U+0061). Smishi catches it. Most keyword filters wouldn't even notice it's there.

📊 [POST 3] Don't trust us — run the numbers yourself

📅 15. JUN 2026. | 👤 metalalchemistspex | #results #demo

We built a 105-case test set covering homographs, typosquatting, morphological case variants, and no-link IBAN scams — and it's downloadable directly from the app's batch-test section. Upload it back in, hit run, and watch the results table populate in real time.

Current score: 93.3% (97/105). Most of what we miss is no-link phishing — scams that rely on IBAN numbers or pure social pressure instead of a clickable link. That's the next thing on the list, and we're saying so out loud rather than hiding it.

And then, mid-build, this happened: one of us got a real SMS — fake traffic police, fabricated case number, citing an actual article of law, 24-hour payment deadline. Not a training example. Not a test case. Just... Tuesday. Ne nasedaj. 🛡️

📌 🎥 Loom Video Demo | 🤗 🎣 Live Space | 🧠 Model Card | 📁 🐙 GitHub

🟢 Status: SUBMITTED Visitors: 1337