πŸ“Ÿ Smishi Blog.exe - Small Build Hakaton [X] [_] [?]
____  __  __ ___ ____  _   _ ___ 
/ ___||  \/  |_ _/ ___|| | | |_ _|
\___ \| |\/| || |\___ \| |_| || | 
 ___) | |  | || | ___) |  _  || | 
|____/|_|  |_|___|____/|_| |_|___|

  NE NASEDAJ β€” Don't Fall For It

🎣 Smishi β€” an SMS phishing detector for Serbian, Bosnian, Croatian & Montenegrin. Built for the Build Small Hackathon, June 2026. 🟒

🚨 [POST 1] Smishing is up 1,300% β€” and your filter doesn't speak Serbian

πŸ“… 13. JUN 2026. | πŸ‘€ metalalchemistspex | #problem #bhs

Smishing has surged over 1,300% in Serbia in the last three years. Messages impersonating PoΕ‘ta Srbije, banks, and the traffic police land on phones daily β€” fake fines, fake parcels, fake prizes, all designed to make you click before you think.

Here's the problem: every major phishing detector is trained on English. Run a Serbian SMS through one and it often just... shrugs. Not because the scam isn't obvious to a human β€” but because the model has never seen the specific tricks that work in Serbian, Bosnian, Croatian, and Montenegrin (BHS/SBCM):

  • πŸ”€ Case inflection β€” nagrada, nagradu, nagradi, nagradom are all the same "prize" lure, just in different grammatical cases. Five strings, one scam.
  • πŸ”€ Script duality β€” the same message can be Cyrillic or Latin, and the two can be mixed. A Cyrillic "Π°" looks identical to a Latin "a" β€” to your eyes. Not to a keyword filter.
  • πŸ“­ No dataset β€” we looked. There wasn't a public, labeled BHS smishing dataset anywhere. So we built one. 1,529 messages, phishing + legitimate, Cyrillic + Latin.

That's the gap. Smishi is our attempt to close it.

βš™οΈ [POST 2] 110 million parameters walk into a phishing filter

πŸ“… 14. JUN 2026. | πŸ‘€ metalalchemistspex | #architecture #bertic

Under the hood, Smishi is an ensemble β€” two models plus rule-based heuristics, all running on CPU:

  • Model A: TF-IDF (character n-grams, 3–5) + Logistic Regression. Fast, simple, surprisingly solid baseline.
  • Model B: fine-tuned BERTiΔ‡ β€” 110 million parameters. Not 70B, not 8B. 110 million. Small enough to run on a laptop, fine-tuned specifically for BHS phishing. 96.96% accuracy, 96.3% F1.
  • Heuristics: suspicious/typosquatted domains, message length, urgency-language flags.

Both models' predictions and confidence scores show up side by side in the UI, along with which red flags fired. Not just "phishing: yes" β€” "phishing: yes, because fake URL + urgency keyword + impersonation pattern."

A favorite test case — a message reading "MUP: Sаobraćajni prekrőaj evidentiran..." — has a Cyrillic "а" hiding inside otherwise-Latin text. Same pixels, different Unicode code point (U+0430 vs U+0061). Smishi catches it. Most keyword filters wouldn't even notice it's there.

πŸ“Š [POST 3] Don't trust us β€” run the numbers yourself

πŸ“… 15. JUN 2026. | πŸ‘€ metalalchemistspex | #results #demo

We built a 105-case test set covering homographs, typosquatting, morphological case variants, and no-link IBAN scams β€” and it's downloadable directly from the app's batch-test section. Upload it back in, hit run, and watch the results table populate in real time.

Current score: 93.3% (97/105). Most of what we miss is no-link phishing β€” scams that rely on IBAN numbers or pure social pressure instead of a clickable link. That's the next thing on the list, and we're saying so out loud rather than hiding it.

And then, mid-build, this happened: one of us got a real SMS β€” fake traffic police, fabricated case number, citing an actual article of law, 24-hour payment deadline. Not a training example. Not a test case. Just... Tuesday. Ne nasedaj. πŸ›‘οΈ


πŸ“Œ πŸŽ₯ Loom Video Demo  |  πŸ€— 🎣 Live Space  |  🧠 Model Card  |  πŸ“ πŸ™ GitHub

🟒 Status: SUBMITTED    Visitors: 1337

πŸ“‘ Build Small Hakaton 2026 ⚑ Smishi Blog v1.0 | ne nasedaj 🎣