Visual browser automation with OmniParser

Overview UI/browser automation tests can be brittle, because tests hook into implementation details of the UI which may not be relevant for actual user interaction. Visual test automation is more robust, because it uses the UI in the same way a user is supposed to do. This post explains a solution for visual browser automation. … Continue reading Visual browser automation with OmniParser

Running large LLMs on small hardware: Gemma 4 12B on a VRAM-constrained Radeon laptop

Google released Gemma 4 12B today. I'm a huge fan of the Gemma model family, they have improved with each iteration and consistently perform on par with larger models. It didn't run at first because it needs more VRAM that my laptop has, but there's a workaround. Here's a short instruction for how to run … Continue reading Running large LLMs on small hardware: Gemma 4 12B on a VRAM-constrained Radeon laptop

OPAW: Real-Time Target Sound Extraction

In this instalment of "One Paper a Week", we're looking at Waveformer, a neural network for extracting specific waveforms from a sound mix in real-time. If you're thinking "Independent Component Analysis", you're not alone: ICA can also extract a desired signal from a mix of signals (similarly to how we are able to understand a … Continue reading OPAW: Real-Time Target Sound Extraction

Restricting VS Code terminal commands to an approved commands list

Motivation If you've ever needed to restrict which commands can be run inside a VS Code integrated terminal - nowadays mainly to prevent agents from wreaking havoc - you can achieve this using a combination of VS Code terminal profiles and PowerShell's PSReadLine module. I'm not sure is/how this works with other terminals, however I've … Continue reading Restricting VS Code terminal commands to an approved commands list

OPAW: Tracking Capabilities for Safer Agents

With AI agents rampaging on half the population's computers, there is increased interest in safe-guarding AI agent workflows. In "Tracking Capabilities for Safer Agents" no one less than Martin Odersky (et al) propose a framework for running AI agents subject to security policies. The answer is - of course - Scala. I'm skipping the problem … Continue reading OPAW: Tracking Capabilities for Safer Agents