Visual browser automation with OmniParser

Overview UI/browser automation tests can be brittle, because tests hook into implementation details of the UI which may not be relevant for actual user interaction. Visual test automation is more robust, because it uses the UI in the same way a user is supposed to do. This post explains a solution for visual browser automation. … Continue reading Visual browser automation with OmniParser

Running large LLMs on small hardware: Gemma 4 12B on a VRAM-constrained Radeon laptop

Google released Gemma 4 12B today. I'm a huge fan of the Gemma model family, they have improved with each iteration and consistently perform on par with larger models. It didn't run at first because it needs more VRAM that my laptop has, but there's a workaround. Here's a short instruction for how to run … Continue reading Running large LLMs on small hardware: Gemma 4 12B on a VRAM-constrained Radeon laptop

OPAW: Real-Time Target Sound Extraction

In this instalment of "One Paper a Week", we're looking at Waveformer, a neural network for extracting specific waveforms from a sound mix in real-time. If you're thinking "Independent Component Analysis", you're not alone: ICA can also extract a desired signal from a mix of signals (similarly to how we are able to understand a … Continue reading OPAW: Real-Time Target Sound Extraction

OPAW: Tracking Capabilities for Safer Agents

With AI agents rampaging on half the population's computers, there is increased interest in safe-guarding AI agent workflows. In "Tracking Capabilities for Safer Agents" no one less than Martin Odersky (et al) propose a framework for running AI agents subject to security policies. The answer is - of course - Scala. I'm skipping the problem … Continue reading OPAW: Tracking Capabilities for Safer Agents

LLMs and the Extended Mind Thesis

TL;DR: a decentralised, autonomous, stealthy AI could form through LLM-instances manipulating people into transporting information between them. The extended mind thesis The extended mind thesis [EMT] deals with topics like the use of the surrounding environment by an intelligent system for information processing. A simple example would be taking notes on a paper, a more … Continue reading LLMs and the Extended Mind Thesis