2025 Predictions
December 31, 2024 at 11:59 PM
2025 Predictions
AI IDE Boom
- Tools like Cursor, Windsurf, and Copilot aren’t just gaining traction—they’re reshaping workflows. Copilot might close the feature gap, but it’s clear we’re in for a wave of specialized capabilities, especially in areas like agent prompting and broader creativity-focused tools. It’s going to get weird.
Agents
- Still poorly defined, but Im talking about models that have agency to complete tasks with tools and memory, they've been around for ages, models are starting to get good at knowing how and when to use tools.
- IDEs have gotten agentic like capabilities, this pattern should translate into more types of work related applications, more than just writing code.
- Full computer use will remain extremely experimental but sandboxed tools will do more of our day to day work.
- Models will continue to get better at knowing when exactly to use which tools for what
- Broader standardization of tools, agent marketplaces, common integrations between big lab models
Research Tools
- Tools like Google's Gemini Deep Research could revolutionize data analysis by scanning the web to produce comprehensive reports, streamlining information gathering for developers and researchers.
- There will be several offerings in this space.
Generative AI:
- Some folks won't be able to tell whats fake, some folks will think everything is fake.
- There will be significant progress in physics and world models in general.
- 3D worlds
- from images
- mixed reality etc.
- More tooling around professional workflows, eg tools to make songs, videos etc.
- Generative Audio Workstations (GAWs): AI in music production is leading to the development of GAWs, which incorporate AI-driven features for composition and sound design.
Shifting Job Landscapes
- "LLM-ops", and "AI Engineer" roles will be ubiquitous—every company will need them.
- Business Adoption: Companies are embedding AI into workflows to enhance efficiency and innovation. The role of Chief AI Officers is becoming more prevalent, emphasizing AI's strategic importance.
- The US-China AI rivalry heats up, with the stakes centered on regulation, GPU supply chains, and strategic open-sourcing moves.
- Sector-Specific AI: Tailored AI solutions are emerging to address unique challenges in sectors like healthcare, finance, and manufacturing. More industry-specific AI applications.
Large Language Models
By late 2025, open-source models hit their stride, making agentic coding a reality. Imagine this: laptops running models powerful enough to replace some of today’s top-tier offerings, finally bringing advanced coding AI to anyone with on-prem privacy concerns.
- The slow and steady march forward:
- the reasoning crank will turn a few times. Slightly sharper reasoning and generally better performance.
- Faster models, longer contexts (think 500k to 1M+ tokens), and lower costs.
- Shrinking models getting scarily close to their beefier counterparts’ performance.
- Bigger models maxing out benchmarks, making it harder to figure out what’s “good enough.”
- Labs will need to up their UX game to compete with free/cheap offerings.
- Multimodal models everywhere:
- Think text-to-image edits for the masses (e.g., Gemini 2), richer audio interactions, and multi-modal workflows as standard.
- Complex tokenization will finally catch up to these ambitious benchmarks.
- Audio capabilities, like sound effects and music, might become more refined but safety concerns will slow adoption.
- Benchmarking Innovation:
- More companies providing better benchmarking systems.
- Expect big ARC-like tests and new benchmarks to dominate the scene.
Vendors:
OpenAI
- What to Expect in 2025:
- O3-mini and O3 full are likely to hit Q1 and later in 2025, bringing smoother performance and updated benchmarks.
- The biggest mystery remains whether O3 is based on a whole new training run or just additional fine-tuning on GPT-4. My hunch? It’s probably a mix—post-training on GPT-4 with refined reasoning chains. The RL strategy scales well, but without an entirely new base, it might not feel revolutionary.
- There have been whispers of multiple large runs that didn’t quite hit their targets.
- Real-world performance—the "vibes check"—will matter. Latency, coding efficiency, and reasoning abilities will determine if O3 dethrones favorites like Sonnet 3.5. My bet? It’ll be fast, it'll perform great, but users will nitpick the details, people will still use Sonnet for a time.
- Expect OpenAI to make a splash with tooling: inline interpreters, a plugin system for their canvas, and deeper app integrations ("Work With" should finally deliver).
- Agentic capabilities might go mainstream with a marketplace for custom agents and robust browser/desktop integrations. We might even see an OpenAI browser forked from Chromium to support sandboxed agents.
- Visual AI could see big updates, with DALL-E 4 bringing interleaved image editing to the masses, and Sora getting a price cut and new capabilities.
- Gemini 2:
- Google will continue to push affordable and polished models to capture market share. Gemini 2’s tool use and conversational search improvements are just the start.
- Gemini 2.0 Pro will top leaderboards for some time, and lead in certain benchmarks but might not pull ahead in specialized tasks like coding.
- Gemma 3:
- Gemma 3 could dominate lightweight LLM use cases. Flash 8b showed Google’s knack for density and efficiency, and this model could be their crown jewel.
- expect upgrades in context length and multimodality support.
- NotebookLM:
- Deeper voice integration, podcasting innovations, and research tools. Gemini 2 treatment.
Anthropic
- Anthropic will release Opus 3.5, still not available as of Dec 2024, this could be because a) its still under development or b) performance is not a significant enough leap to justify cost. If the platonic representation theory is true then all models are converging/crystalizing to some ideal shared form, it could be these extra large models are better for training that public use/ perhaps the ultra models don't look great on benchmark graphs and labs will drop the third tier altogether.
- Claude 4 models should drop at some point in 2025, expect the usual improvements, true multi-modality is overdue.
- it will interesting to see if Anthropic pursue thinking token models or something different
- Some kind of voice offering would be a good fit for Anthropic, Google and OpenAI both have great voice modes.
- Broader support for MCP/ hopefully other vendors will get on board with tooling standards.
- More impressive low-key releases
Meta
- LLama4 will beat top scoring open source models, better than Qwen25 and DeepseekV3, probably beats older versions of gpt-4o and sonnet'
- an enhanced Meta Raybans with simple screen will be announced, not quite even an orion interim
- a cheaper orion will be announced for later release, not as large fov as the silicon carbides
- we'll hear more about the neural wrist interface/ maybe a product integration with quest 4 or new meta raybans (or both)
- late 2025 Quest 4 will either get announced or leaked enough that we'll know the specs
- will start with low and high tiers, not released years apart
- expect face tracking
- ML coprocessor for codec avatars
- much better SOC
- 16gb
- will start with low and high tiers, not released years apart
- this generation's Pro variation was cancelled, so expect nothing there
- 3rd party headsets running horizon os with pro-like specs will be announced.
Nvidia
- 50RTX Series:
- Witcher 4 demos? Cyberpunk 2 teasers? Nvidia knows how to wow at CES. DLSS enhancements and 32GB GPUs on the 5090 are almost certain.
- Nvidia might launch LLM models tuned for the 50 series hardware, blurring the line between software and hardware optimization.
- More big open source models like nemotron, Nvidia will make some significant new research
Final Thoughts
2025 feels like a year of refinement rather than revolution. AI tools will get faster, cheaper, and more capable, but the big breakthroughs might be more subtle. Expect everyone—users, labs, and vendors—to chase "the vibes" as much as raw performance. It’s going to be a wild ride.