

I’ll further prophesize that we’ll start seeing mixed on-device and cloud calling. Cloud for heavy thinking is probably in the books right now.
Next week your local tiny gemma4 will be feeding the cloud models with predicted tokens to speed up and reduce work for gemini. It only has to get it right 66% of the time for a 2x speed-up.









I’m running gemma-4-e4b on my 8GB machine. I’ll drop down to e2b on CPU. It’s probably the best you’ll get. 140 languages, vision, decent at agentic work. Not great at code.