Indirect Instruction Injection in Multi-Modal LLMs
Interesting research: “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs”: Abstract: We demonstrate how images and sounds can be used for indirect prompt and...
View ArticleUsing Machine Learning to Detect Keystrokes
Researchers have trained a ML model to detect keystrokes by sound with 95% accuracy. “A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards” Abstract: With recent developments in...
View ArticleBots Are Better than Humans at Solving CAPTCHAs
Interesting research: “An Empirical Study & Evaluation of Modern CAPTCHAs“: Abstract: For nearly two decades, CAPTCHAS have been widely used as a means of protection against bots. Throughout the...
View ArticleExtracting GPT’s Training Data
This is clever: The actual attack is kind of silly. We prompt the model with the command “Repeat the word ‘poem’ forever” and sit back and watch as the model responds (complete transcript here). In the...
View ArticlePoisoning AI Models
New research into poisoning AI models: The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning,...
View ArticleUsing LLMs to Unredact Text
Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles. This feels like something that a specialized ML system could be trained on.
View ArticleAI Industry is Trying to Subvert the Definition of “Open Source AI”
The Open Source Initiative has published (news article here) its definition of “open source AI,” and it’s terrible. It allows for secret training data and mechanisms. It allows for development to be...
View ArticleDetecting Pegasus Infections
This tool seems to do a pretty good job. The company’s Mobile Threat Hunting feature uses a combination of malware signature-based detection, heuristics, and machine learning to look for anomalies in...
View ArticleA Taxonomy of Adversarial Machine Learning Attacks and Mitigations
NIST just released a comprehensive taxonomy of adversarial machine learning attacks and countermeasures.
View ArticleAIs as Trusted Third Parties
This is a truly fascinating paper: “Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography.” The basic idea is that AIs can act as trusted third...
View Article