Return the Fund
Posts
On Proactive AI Agents

On Proactive AI Agents

Vibe coding, MCP, and a platform empowering useful agents

Prerit Das
April 09, 2025

Return the Fund 🚀

The frontier of tech-focused VC research

In today’s email

Vibe coding, MCP, GPT-4o, and the missing thread
My favorite employee
Proactivity in software, and in AI agents, is not real
But proactivity can still be done, incredibly powerfully
Outlining a new platform idea: the MCP of proactivity

State of the union

As everyone and their mothers vibe code MCP servers, and interns order Uber Eats from their code editors, I'm left thinking about value.

Quick hint: Model Context Protocol (MCP), developed by Anthropic, is an open standard for LLMs to communicate with external systems. Think of it as an adapter to easily add tools to an AI agent.

The advancements of late are truly inspirational. GPT-4o generating images with perfect text, mass adoption of MCP, and vibe coding tools producing stunning websites are all moving us in the direction of value creation.

It has never been easier to generate perfect marketing assets, thanks to GPT-4o.
Because of MCP, end users don’t need custom integrations to make agents useful externally.
Llama 4 Scout’s 10M-token context window (and still-performant inference) enables seamless comprehension of extensive texts.
With vibe coding tools like Vercel v0 and Lovable, companies can deploy responsive landing pages in minutes.

But this is already abundantly clear. And while I’m incredibly excited about these tools, I often see people throw the baby out with the bath water. There’s an anchoring thread of realism missing from this narrative.

The other day, I saw something on YouTube that made me chuckle.

Reality check

The vibe coders, so immersed in the vibes, are now stuck debugging their spaghetti code. I laugh, but I’m bullish—my company spends thousands of dollars a month on AI tools, and they unequivocally provide more value than they cost.

Realistically, they are just that—tools, and nothing more. Entrusting them with end-to-end development and logical integration is a fool’s errand. The most prolific AI companies have understood this for years. As part of a deep dive on AI infrastructure, I talked to many of them with my friend Jonathan Shriftman. My greatest takeaway was their disappointment in the step-change unlocks of new model intelligence.

As models incrementally improve, they unlock new tasks completable with zero-shot prompting (no examples, no further corrections). Much of this is post-trained intelligence, like Anthropic hiring top UI developers to fine-tune Claude with the most modern implementations of frontend design.

What I haven’t seen is a superintellectual disintermediation of informed engineering. When you do something a thousand times and collect nuggets of insight along the way, you build a gut sense. You intuit when something doesn’t feel right, when something could go wrong, how to preemptively mitigate future risks, and how to approach novel complex problems.

While we’re making progress, particularly with the proliferation of advanced reasoning models like OpenAI’s o1, DeepSeek’s r1, and even hybrid reasoning models like Anthropic’s Sonnet 3.7, it’s unclear how we bridge the intuition gap.

Models are trained on billions of lines of code from GitHub, with all sorts of imperfections and hidden vulnerabilities. Perhaps an answer is the dataset—somehow only allowing source code from actively maintained, production systems with high SLAs (99.9…% uptime guarantees)? After all: shit in, shit out. But this is the hardest code to access. Ultimately, I don’t know the answer; but my point is, we’re not there yet.

Proactive AI agents

I've been deeply considering what makes an employee valuable, especially over the past week.

When running a small company ($1 to $10M ARR) involved in many ventures, it is pivotal to optimally allocate labor.

My ideal employee is someone I entrust with ownership over that which I’m constrained by. Someone who translates high-level ambition into grounded execution, while embracing the necessity to enter the mud—navigating low-level event loops and sub-operations with diligence. They act as a reliable interpreter and executor of strategic intent, being the final backstop and responsible party of their projects. They proactively uncover blind spots, anticipate bottlenecks, and establish clarity amidst ambiguity.

A couple days ago, unbeknownst to me, a production error matriculated through our logs. Relatively benign, but affecting a single user’s account. One of my employees, who exemplifies the above description, had already noticed, replicated, and tested possible alternatives. One might think, “well, that’s his job.” But as I’m thinking about the role of AI tools in our organization, I don’t take him for granted in the slightest. I have the confidence in him to maintain a perpetual radar and respond to external events as they occur (like unusual logs).

This is exactly what I’m missing from today’s AI tools.

So how do we define that missing behavior within the framework of AI agents? Well, as aforementioned Jonathan and I discussed during our research: proactivity.

Proactivity: the capacity to autonomously anticipate one’s needs, independently identify opportunities without explicit instruction, and subsequently initiate purposeful actions.

This isn’t real

Unfortunately, proactivity doesn’t exist in software. And if we’re getting philosophical, I postulate that proactivity doesn’t exist within humans either.

Proactive software is but reactive code orchestrated to respond to unseen triggers.

Let’s consider payroll and HR tools like Rippling and Deel. Founders love software that feels to be watching their backs, which (good) HR software does.

It keeps track of my employee documents and reminds me well before anything expires.
It makes sure payroll runs smoothly and precisely on schedule, every time.
It monitors tax and labor laws in real-time, instantly letting me know when something relevant changes.
It gently nudges candidates about their job offers before deadlines pass.
It warmly welcomes new hires with onboarding emails the moment they accept an offer.
It proactively encourages employees to enroll in and customize benefits.

The software feels like a full HR department working 9-to-5 to proactively manage labor. It anticipates issues, monitors external resources, and communicates with whoever necessary. Feels sentient.

But once again, this proactivity is but a mirage: simply a series of reactions to hidden triggers.

Expiring Documents: Triggered by daily automated checks comparing document expiration dates.
Payroll Runs: Triggered by a scheduled internal timer hitting the payroll date.
Compliance Alerts: Triggered by receiving external compliance updates (ex. new laws).
Offer Letter Reminders: Triggered by a timer counting down to the offer expiration date.
New Hire Welcomes: Triggered by the status change event of a candidate accepting an offer.
Benefits Enrollment Notices: Triggered by preset calendar events reaching open enrollment dates.

The Rippling example is fairly high-level, but the theory of proactivity as a reaction to hidden triggers holds true in low-level systems as well. Conventionally asynchronous code—HTTP servers, async/await loops, OS schedulers, cron jobs, interrupt handlers, and promises/futures—are in fact reactive; their sophisticated abstractions merely create the compelling illusion of proactive behavior.

Building a proactive agent

Today, we communicate with AI agents through conversation.

Its input is human text; its output is interpretable by the same human. Such a system enables incredible chat experiences; we have a whole vertical of customer support agents proving invaluable to startups.

But these systems are dormant until invoked by a human’s chat request. Once alive, they might loop through countless tools before reaching a final answer… But they’re still operating within the timespan of human invocation.

Given the concept of proactivity as reactivity with hidden triggers, we can build a proactive AI agent by building, well, hidden triggers.

For example: a new email is received, Google Cloud Run hit an error, my Uber is delayed, a customer opened a charge dispute, the WSJ just published a story.

These external services, not a human, must invoke the AI agent, and trigger the agent’s loop as it uses tools to reach a completion state. It’s final answer may be, “cool, whatever, I don’t need to do anything.” Or, “looks like this server error is pretty bad, let me open a GitHub issue.” Or, “I’ve never seen such an angry customer email. Alert my human immediately.”

If the model is fine-tuned on ideal output cases and prompted according to its user’s needs, it can accurately preempt responses to external events. It can notice a server error and open a ticket. It can draft replies to customer emails. It can alert humans when something consequential is reported by the news. It can recommend that you follow-up with a friend. It can keep you on time throughout the day. There’s no end to the possibilities unlocked.

Working backwards from this vision, I’ll outline an opportunity to build the infrastructure for proactivity.

The opportunity

First, a couple quick definitions. An API is how Software A communicates with Software B. It’s how my code (A) can connect to Gmail (B), reading and sending messages programmatically.

A webhook is how Software A receives messages from Software B, at the discretion of Software B. It’s how my server (A) can wait for Gmail (B) to say, “hey, I just got an email, in case you want to do something about it.”

Webhooks are the rails for proactive agents.

Returning to my earlier diagram—there’s actually another layer beneath. Services don't invoke the agent directly; instead, they communicate with an API server, which in turn activates the agent.

In my aforementioned research, I presented this diagram breaking down the sublayers of the AI infrastructure stack.

The Model Context Protocol lives in the integration layer. MCP servers are attached to a model’s set of tools, such that when invoked, the model can take action using external services. MCP doesn’t care what invoked the model.

Right now, MCP enables agent → server communication. What we’re missing is webhook orchestration to enable server → agent communication.

A webhook reception system receiving messages from outside services would not live in the integration layer. It would sit under orchestration. Hence, they work together, as I’ll demonstrate.

The server, whose address is given to external services (saying “hey, ping me here when you have an update”) plays air-traffic controller with inbound messages. It passes them to the agent, who then determines if it (a) wants to ignore it, or (b) wants to use its tools to take action. Its human is none-the-wiser, as this entire process occurs in seconds.

Such a system works harmoniously with MCP, not in spite of it. Remember, MCP doesn’t care how the agent was invoked; it purely provides tooling to the agent. Webhook orchestration empowers the agent with proactivity, while MCP makes the agent useful on invocation.

A new platform

What I’d love to see (and maybe build?) is an MCP-style marketplace for webhook-based agent orchestration. The steps are simple.

Connect your agent via a webhook URL.
Connect services via the dashboard, logging into each one directly via OAuth (”sign in with Google”).
Choose which hidden triggers to enable, making your agent proactive.
As events occur, the platform formats information into LLM-friendly inputs and orchestrates your agent.

That’s it. The rails for proactivity; the MCP for server → agent communication.

And true to my opening words, I vibe-coded an MVP.

(If you want a link to the deployed MVP, hit reply and I’ll send it to you.)

Such a protocol, accessible via a platform, would abstract the concept of proactivity as reactivity with hidden triggers. It would allow builders to easily empower agents to preemptively take action on external events, the same way MCP empowers agents to seamlessly operate outside services.

An important distinction

One way to intuit the difference between MCP and a proactivity protocol is by thinking of inputs and outputs. They work together, but accomplish very different things.

MCP gives an agent tools. Think of a tool as an ability, like “Send Email,” or “Reply to Email by ID” The proactivity protocol defines types of events that come back from Gmail, like “a new email has been received, containing an ID, sender, subject, timestamp, history, and body.”

As pieces to the puzzle of a proactive agent, while they both facilitate communication with Gmail, they are entirely different. In fact, MCP servers could bolt onto the proactivity protocol platform—wouldn’t it be nice if your agent could receive an email (protocol), note its ID, and then reply to it (MCP)? Without requiring separate implementation.

How it succeeds

My two reasons for why MCP is successful:

It's open-source and an open standard (despite being developed by Anthropic). Anyone can build, release, fix, and implement MCP adapters.
Mass-adopted protocols live and die by network effects. In my eyes, the key moment for MCP was OpenAI’s decision to follow the standard within its GPT models. Again, MCP is built by OpenAI’s #1 rival—but the open standard status and network effect of adoption are simply too powerful.

This proactivity protocol would, like MCP, be an open-source standard for LLM-friendly inbound connectors. Any company can build a connector for themselves; ex. Google could build one for Gmail, defining hidden triggers (email received, email bounced). They’d wrap it in Google OAuth, letting users log-in-with-Google to authorize inbounds to their agents.

Or, any developer could build a connector around a public API as long as it’s authorizeable (OAuth, api key).

The platform would employ a Docker-style freemium model—open-source core paired with an optional paid hosting service—to provide users simplicity, sparing them the burden of technical intricacies.

Some existing incumbents stand to benefit tremendously from the proliferation of proactive agents.

Serverless cloud providers, ex. GCP’s Cloud Run, Azure Functions, and “newer” players like Temporal
Platforms who’ve abstracted OAuth connectors, like Zapier, Pipedream, and Paragon
Open-source SDKs facilitating tool-based agent orchestration, like PydanticAI (my favorite)

Final thoughts

It’s worth reiterating that this protocol works hand-in-hand with the Model Context Protocol. When invoked, the protocol-orchestrated agent leverages MCP to integrate directly with your tools—sending texts, prompting follow-ups, replying in Slack, drafting emails, or suggesting alternate calendar times.

The protocol is infrastructed with webhooks. I’m extremely bullish on webhooks, which may seem a crazed statement to engineers given webhooks have existed for decades. I just think they’re categorically underutilized in the context of agents. Webhooks are the solution to the orchestrational void in server → agent communication.

I’m excitedly building proactive agents, manually connecting them to my most trusted services. And I’m already blown away by how much value is unlocked—it’s like having a second brain, always on the lookout, hacking away the leaves in front of you as you traverse through the jungle.

I’m confident the agent space will move in this direction; but, as happened when I toyed with Jeeves, I’m sure the true value will be in infrastructing our impending paradigm shift. Everyone will have their own agent, but perhaps every agent will use the same protocol.

Anyways. Something I’m thinking about as we vibe-code ourselves into oblivion.

-Prerit

Reply

or to participate.