Paraphrasing Attacks Bewilder Text-Based AI Models

Newly-developed algorithms can now fool text-based AI models using natural language processing (NLP) to gauge sentiment. These algorithms developed by Chen Pin Yu and Wu LingFei of IBM could modify the behavior of spam filters, fake news detectors, and sentiment analysis algorithms.

Instead of changing just individual words, the new algorithms rewrite whole sentences. “This gives the attack a larger space by creating sequences that are semantically similar to the target sentence. We then see if the model classifies them like the original sentence,” Chen says.”

Techtalks Founder Ben Dickson tells us more about paraphrasing in this report from VentureBeat:

The key to the success of paraphrasing attacks is that they are imperceptible to humans, since they preserve the context and meaning of the original text. “We gave the original paragraph and modified paragraph to human evaluators, and it was very hard for them to see differences in meaning. But for the machine, it was completely different,” Wu says.

(AI researcher Stephen) Merity points out that paraphrasing attacks don’t need to be perfectly coherent to humans, especially when they’re not anticipating a bot tampering with the text. “Humans aren’t the correct level to try to detect these kinds of attacks, because they deal with faulty input every day. Except that for us, faulty input is just incoherent sentences from real people,” he says. “When people see typos right now, they don’t think it’s a security issue. But in the near future, it might be something we will have to contend with.”

Merity also points out that paraphrasing and adversarial attacks will give rise to a new trend in security risks. “A lot of tech companies rely on automated decisions to classify content, and there isn’t actually a human-to-human interaction involved. This makes the process vulnerable to such attacks,” Merity says. “It will run in parallel to data breaches, except that we’re going to find logic breaches.”

For instance, a person might fool a hate-speech classifier to approve their content, or exploit paraphrasing vulnerabilities in a resume-processing model to push their job application to the top of the list. “These types of issues are going to be a new security era, and I’m worried companies will spend as little on this as they do on security, because they’re focused on automation and scalability,” Merity warns.

Paraphrasing Attacks Bewilder Text-Based AI Models

Sign up for the CDO Newsletter

Recent Posts