What you need to know
- Gmail now features a new text vectorizer called RETVec, which results in 38% better spam detection.
- Text vectorizers help identify letters and symbols in emails and are sorted as spam accordingly.
- Some spam senders manipulate letters and symbols, use homoglyphs, add invisible characters, and use keyword stuffing to try and bypass spam filters.
Spam detection in Gmail should improve thanks to a back-end upgrade to text identification across some Google services. Thanks to the security upgrade, Google says that Gmail is now 38% better at detecting spam.
The company announced the update recently in a Google Security blog post (via 9to5Google). Before that, it was tested internally at Google for the last year. It represents the "largest defense upgrades in recent years," the company says.
The new addition to Gmail spam detection is RETVec, which stands for Resilient & Efficient Text Vectorizer. Text vectorizers are used to identify the content of an email, that is sometimes hidden by the sender. Manipulating letters and symbols, using homoglyphs (different characters that appear similar), adding invisible characters, and using keyword stuffing to try and bypass spam filters.
"RETVec achieves these improvements by sporting a very lightweight word embedding model (~200k parameters)," Google said in the post. "Allowing us to reduce the Transformer model's size at equal or better performance, and having the ability to split the computation between the host and TPU in a network and memory efficient manner."
The biggest benefit of RETVec is that it is 38% better at detecting spam, but there are plenty of other improvements as well. That accuracy improvement includes a reduction in false positives by nearly 20% and in false negatives by nearly 18%. False negatives are when Gmail's spam detector fails to filter a spam email as spam, and false positives are when valid emails are incorrectly sorted as spam.
Since Google has managed to reduce the size of the Transformer model, using RETVec lowered Tensor Processing Unit usage by 83%. That's a significant efficiency benefit to employing this new text vectorizer in Gmail.
RETVec was developed by Google Research, and it's entirely open-source. After Google's lengthy in-house testing period, the company found it to be "highly effective for security and anti-abuse applications."
People looking to use RETVec for their own applications can follow a tutorial from Google that explains how to get started.