Google Releases Gemini 3.5 Flash Computer Use and Gemma 4 12B Local Model
Google has integrated computer use capabilities into Gemini 3.5 Flash, allowing developers to build custom agents that automate desktop and browser actions. Additionally, the new Gemma 4 12B open model runs locally on 16GB of memory with native vision and voice processing.
Impact: High
Why it matters
You can now build cost-effective, multi-step automation agents via the Gemini API or run private, multimodal workflows entirely offline.
TL;DR
- 01Gemini 3.5 Flash now supports agentic computer use, ideal for building automated GUI testers.
- 02Gemma 4 12B runs multimodal tasks locally on 16GB RAM laptops, keeping data fully private.
- 03Gemini 3.5 Live Translate is available in public preview for real-time speech translation in 70+ languages.
Key facts
- Gemma 4 RAM Requirement
- 16GB
- Live Translate Languages
- 70+
Building Agents with Gemini 3.5 Flash Computer Use
The addition of computer use capabilities to gemini-3.5-flash allows developers to construct autonomous agents that can visually parse and interact with system interfaces. The system interprets screen states, reasons about next steps, and executes mouse and keyboard inputs. This is highly optimized for enterprise automation and continuous regression testing.
Offline Multimodal Workflows with Gemma 4 12B
Gemma 4 12B offers a private, local alternative for edge deployment. Key specifications and features include:
- RAM Requirement: Runs locally on consumer laptops using just 16GB of memory.
- Multimodal Architecture: Features a unified design with native vision and real-time voice processing within a single stream.
- Deployment: Targeted at developers needing strict offline privacy and low-latency interaction.
Live Translation and API Expansion
Google also announced Gemini 3.5 Live Translate. This audio model translates speech-to-speech across 70+ languages, preserving native tone and minimizing latency. It is available in public preview via the Gemini Live API, Google AI Studio, and the Google Translate mobile app.
Try it in 2 minutes
from google import genai
from google.genai import types
client = genai.Client()
response = client.models.generate_content(
model='gemini-3.5-flash',
contents='Automate clicking the Chrome icon and opening github.com',
config=types.GenerateContentConfig(
tools=[{"computer_use": True}]
)
)python
✓ When to use
- Building lightweight automation agents that need to interact with websites or desktop apps.
- Developing offline-first applications requiring secure, local image and voice reasoning.
✕ When NOT to use
- High-volume batch automation where headless API actions are faster than visual computer use.
- Low-spec devices with less than 16GB of system RAM for running Gemma 4 locally.
What to do today
- Test the computer use API in Google AI Studio using a sandbox environment.
- Download Gemma 4 12B to test local vision/voice pipeline latency on a 16GB laptop.
Sources