π£ Smishi β an SMS phishing detector for Serbian, Bosnian, Croatian & Montenegrin. Built for the Build Small Hackathon, June 2026. π’
π¨ [POST 1] Smishing is up 1,300% β and your filter doesn't speak Serbian
π 13. JUN 2026. | π€ metalalchemistspex | #problem#bhs
Smishing has surged over 1,300% in Serbia in the last three years. Messages impersonating PoΕ‘ta Srbije, banks, and the traffic police land on phones daily β fake fines, fake parcels, fake prizes, all designed to make you click before you think.
Here's the problem: every major phishing detector is trained on English. Run a Serbian SMS through one and it often just... shrugs. Not because the scam isn't obvious to a human β but because the model has never seen the specific tricks that work in Serbian, Bosnian, Croatian, and Montenegrin (BHS/SBCM):
π€ Case inflection β nagrada, nagradu, nagradi, nagradom are all the same "prize" lure, just in different grammatical cases. Five strings, one scam.
π Script duality β the same message can be Cyrillic or Latin, and the two can be mixed. A Cyrillic "Π°" looks identical to a Latin "a" β to your eyes. Not to a keyword filter.
π No dataset β we looked. There wasn't a public, labeled BHS smishing dataset anywhere. So we built one. 1,529 messages, phishing + legitimate, Cyrillic + Latin.
That's the gap. Smishi is our attempt to close it.
βοΈ [POST 2] 110 million parameters walk into a phishing filter
π 14. JUN 2026. | π€ metalalchemistspex | #architecture#bertic
Under the hood, Smishi is an ensemble β two models plus rule-based heuristics, all running on CPU:
Model B: fine-tuned BERTiΔ β 110 million parameters. Not 70B, not 8B. 110 million. Small enough to run on a laptop, fine-tuned specifically for BHS phishing. 96.96% accuracy, 96.3% F1.
Both models' predictions and confidence scores show up side by side in the UI, along with which red flags fired. Not just "phishing: yes" β "phishing: yes, because fake URL + urgency keyword + impersonation pattern."
A favorite test case β a message reading "MUP: SΠ°obraΔajni prekrΕ‘aj evidentiran..." β has a Cyrillic "Π°" hiding inside otherwise-Latin text. Same pixels, different Unicode code point (U+0430 vs U+0061). Smishi catches it. Most keyword filters wouldn't even notice it's there.
π [POST 3] Don't trust us β run the numbers yourself
π 15. JUN 2026. | π€ metalalchemistspex | #results#demo
We built a 105-case test set covering homographs, typosquatting, morphological case variants, and no-link IBAN scams β and it's downloadable directly from the app's batch-test section. Upload it back in, hit run, and watch the results table populate in real time.
Current score: 93.3% (97/105). Most of what we miss is no-link phishing β scams that rely on IBAN numbers or pure social pressure instead of a clickable link. That's the next thing on the list, and we're saying so out loud rather than hiding it.
And then, mid-build, this happened: one of us got a real SMS β fake traffic police, fabricated case number, citing an actual article of law, 24-hour payment deadline. Not a training example. Not a test case. Just... Tuesday. Ne nasedaj. π‘οΈ