AI9: Building an AI wrapper builder

which could even become an AI agent builder

Mar 02, 2025

Something I’m starting to dislike about this learning journey is that I’m building many prototypes (which is great) but not actually shipping them to end users. The actual process of pushing to production isn’t technically hard. With something like Vercel, it’s much simpler than before.

Then, what’s the value of that extra step?

Well, my goal in learning more about AI is to feel more comfortable working on it, slowly develop taste in AI, and eventually ship a useful AI product. Whatever I have been doing and writing about here is the means to an end: A useful AI product.

Even though shipping to production is technically not educational, getting people to use my product is. But it also involves setting up a database, integrating Stripe for payments, getting a domain, and so on. If I’m not careful, these will distract me without adding value to my learning journey.

So, I want to keep the boilerplate stuff to a minimum and focus on learning new technologies, understanding how they can be used, and building something people want to use. I already have a little spark of an idea to work on. I hope to have some progress to share by next week!

With that said, here’s my project of the week.

Wrapping AI wrappers

After learning about DSPy, I wanted to see if I could build anything useful with it.

I used to be non-technical and had to rely on my teammates for any bit of technical work. A Buffer colleague once wrote a step-by-step guide for me on how to edit the code of our blog but I still had to ask him a ton of questions. After learning to code, I appreciate the ability to build my own tools.

Even though we have many consumer AI tools, such as ChatGPT, you have to use their API if you want to build a tool powered by them. There are sophisticated workflow builders such as N8N and Gumloop but most friends said they feel overwhelmed just looking at them.

What if I could let my non-technical friends easily build their own AI tools?

Introducing the AI wrapper builder!

The app studio for creating an AI tool, which is as simple as creating a form

Thanks to DSPy, we can create an AI tool simply by defining the inputs and outputs. No prompt engineering is required.

My AI wrapper builder is essentially a form builder. You add your desired fields (inputs) and expected results (outputs) to a form. For the screenshot above, our form takes in an image and a caption style to produce an Instagram caption.

In DSPy’s language:

image, caption_style -> instagram_caption

Here are a few other examples:

Website summarizer: Website URL1 → Summary
Video transcriber: Video → Transcript
Price scraper: Ecommerce website URL → Product name, prices

Once your AI tool is built, you can use it or share it with your teammates.

If the results produced by the form are not good enough, you can add rules and examples to improve the generation.

Again, this leverages DSPy under the hood. Rules ensure that DSPy finds a prompt that meets those rules (e.g. the Instagram caption should be shorter than 50 characters) while examples allow DSPy to test the prompts for our specific use cases, use them as examples in the prompt, and even finetune the underlying model.

Adding rules and examples can improve the quality of the generated results

But it can do even more.

AI agents are essentially LLMs with tools (and a bunch of other things, such as memory).

What if we could create an agent using the AI tools we have built?

That’s what the Agent studio is for.

First, describe what you want the agent to do and connect your accounts that the agent will need access to. Using a powerful reasoning model, this agent will create a plan to complete the task and figure out workarounds when certain steps fail.

Then, customize the agent by selecting the relevant tools from a list of pre-created tools or your own AI tools. This will allow the agent to do things beyond simply generating text.

You can even schedule the agent to work on a recurring basis.

While the idea sounded good initially, I eventually felt it is too generic and not specialized for any industry or workflow. It is hard to see who the first users might be and hard to market it. Even though I love building horizontal tools because they would be useful to more people, our startup experience so far has taught us to focus on a specific use case and customer profile.

I won’t be working on this project further but if anyone is interested in the code or wants to develop it further, let me know!

Quick disclaimer: This is only the frontend, and I have not hooked it up to DSPy on the backend.

Jargon explained

pip install -e .: We have been developing a Python package, and I have been building an app using it to test it. When I wanted to edit the Python package’s source code to test changes in my app, I learned that pip install -e . allows me to install our Python package in editable mode. Changes to the Python package’s source code locally will be immediately reflected without reinstalling the package. Then I can use pip freeze | grep -E "^-e" to check the editable packages and where they are installed on my computer.
Structured output and constrained decoding: I only learned this week that OpenAI’s API guarantees the correct structure for our outputs if we provide a JSON schema. Embarrassingly, I used to re-generate a response when the output structure was incorrect. OpenAI achieves 100% reliability via a technique called constrained decoding2. In simple terms, the AI model can only pick tokens that match the provided JSON schema. Google Gemini also offers this, and other major AI providers will probably follow in the future. (If you are interested in more on this,
Simon Willison
just wrote about adding structured data extraction for his command line tool.)
Literal type in Python: This type allows you to specify a set of values that a variable can take (e.g. 200, 400, 404 in the example below). Besides helping with type checking, it makes the code easier to understand because you can see which values are allowed.

from typing import Literal

def get_status(code: Literal[200, 400, 404]) -> str:
    if code == 200:
        return "OK"
    elif code == 400:
        return "Bad Request"
    elif code == 404:
        return "Not Found"
    else:
        raise ValueError("Invalid status code")

Interesting links

Claude 3.7 Sonnet and Claude Code: Anthropic finally dropped a new model, Claude 3.7 Sonnet, which seems even better at coding than the already impressive Claude 3.5 Sonnet. The team also launched a command line tool, Claude Code, that uses the new model to work on coding tasks on our computers.
Introducing GPT-4.5: Not wanting to lose the limelight, OpenAI also launched GPT-4.5. While it hasn’t blown people’s minds like Claude 3.7 Sonnet did, it seems to have higher EQ, which I personally think is something that has been overlooked as everyone focused on intelligence.
OpenAI's Deep Research demonstrates that reasoning does not solve hallucinations: My friend
Gijs Verheijke
wrote a long post of his thoughts about the current state of AI, especially since Deep Research seemed to have created quite a hype.