AI11: Otaro, Muse, and a sneak peek of Stores

A few updates for some of our latest AI projects

Mar 16, 2025

My cofounder SK and my strategy for finding startup ideas is to quickly build and launch multiple ideas, kill off those that don’t work, and invest in those that take off. Over the past few weeks, we have been working on a few new AI projects.

Here are some updates and lessons from three recent projects:

Stores: A Python library for equipping AI agents with tools and a literal library of tools (sneak preview!)
Muse: An AI writing companion Chrome extension
Otaro: A Python library for programmatically optimizing prompts

Stores (a sneak peek)

We will launch a new project next week but, as my newsletter subscriber, you get a sneak peek!

Since the start of 2025, SK and I have been prototyping several AI agents. While people are still debating what exactly an AI agent is, a simple definition of an AI agent is an AI model that can use tools to extend its capability.

For example, an AI model that can search the web is an AI agent. The AI model has a tool that allows it to search the web. We can build all sorts of tools for AI models to use.

A tool to browse Reddit.
A tool to send emails.
A tool to edit files.
A tool to draw.

And so on.

Most developers currently build their AI agents to follow a fixed workflow. For example, the AI agent might browse the web for relevant information first, then summarize the information, and then email the summary to the user.

But as AI models become even smarter, we can let them figure out how to complete tasks. In fact, most leading AI models, even Claude 3.5 Sonnet, are already good enough at creating plans, using tools, and finding alternative solutions when stuck.

In such a world, building AI agents will be less about creating fixed workflows but more about creating powerful tools they can use. We are already seeing this shift. For example, to enable Claude to use a computer on our behalf, the Anthropic team created a Computer Use tool. This tool allowed Claude 3.5 Sonnet to beat the previous state-of-the-art performance in completing real-world computer tasks by a huge margin (22.0% vs 7.8%). OpenAI’s Operator is similarly an AI model1 with a tool to use the browser.

When building our AI agents, we noticed we spent a lot of time building, testing, and fixing our tools. Even a tool to “simply” search on Google (without using their API) wasn’t exactly simple.

What if we make it super easy for developers to add tools to their AI agents without having to build those tools themselves?

There is no reason for developers to keep building tools from scratch when someone else has already built them. Imagine building a calculator yourself whenever you want to multiply two numbers.

Instead, developers should focus on picking the relevant tools and building other parts of their AI agents, such as the memory and user interface.

(In fact, I think developers might not even need to pick tools eventually.)

This thought led to Stores.

With Stores, you can easily add tools that we and other developers have created to your AI agents.

Adding the default tools, such as Google search, is as simple as this:

index = stores.Index()

# Initialize the model and bind the tools to the model (with LangChain)
model = ChatGoogleGenerativeAI(model="gemini-2.0-flash-001")
model_with_tools = model.bind_tools(index.tools)

You can also add tools from other developers or, if you really want to, your own custom tools (which are simply Python functions).

# Adding tools from other developers
index = stores.Index(["greentfrapp/file-ops"])

# Adding your custom tools
index = stores.Index(["./custom_tools"])

If a tool requires an API key, you can provide it securely via environment variables.

index = stores.Index(
    ["./custom_tools"],
    env_vars={
        "./custom_tools": {
            "APP_API_KEY": os.environ["APP_API_KEY"],
        },    
    },
)

We are in the super early stage of building Stores but will have a usable version by next week. If you are building AI agents, let’s chat! I’d love to see how we can help.

Muse

The Chrome Web Store team rejected my Muse app this week. Here’s the story behind:

Last Friday night, after my son had gone to sleep, I tried to get Muse ready by Sunday so that you could try it. To move fast, I did things in parallel. Since the approval would take some time, I submitted the frontend for approval while I tried to figure out AWS Lambda for my backend.

The team rejected Muse because my frontend calls my local server (http://localhost:8000), so it doesn’t actually work (lol). They probably thought I am stupid but I just wanted to move fast. I assumed I could swap the backend endpoint after they checked the design and code of the frontend. To be fair to them, they have to check that the extensions work so that the Chrome Web Store isn’t filled with broken extensions.

Also, my plan wouldn’t have worked because updating the app also requires their approval. Oh well. For this reason, I prefer building web apps over Chrome extensions because I can update my web apps anytime. But a Chrome extension allows my users and me to use Muse as we write without needing a separate tab. I hope this trade-off is worthwhile. Let’s see!

On a positive note, I added a new feature before resubmitting Muse for approval. I can now add writing guidelines, which will lead Muse to give me more personalized feedback and suggestions. The guidelines can include my writing preferences, tone and style, and examples.

Behind the scenes, Muse stores the writing guidelines in my Chrome profile. This means the guidelines will sync across tabs and devices. I also don’t have to store them in a database, which means I don’t have to collect user data.

Sorry again for the delay. You should be able to try Muse by next week. I’ll share a Note as soon as it is approved.

Otaro

After playing around with DSPy to optimize prompts programmatically, we found a few things we wish it had:

An easy way to retrieve the optimized prompts (DSPy doesn’t even have an easy way to view the optimized prompts)
An easier way to define our inputs and outputs
An easier way to create and use rules to optimize the prompts

Since a tool for optimizing prompts will be useful for our future AI projects, we built Otaro with these improvements for ourselves:

Instead of writing DSPy signatures (input1, input2 -> output1), we can use a simple YAML config file. This makes the setup easily shareable, too.

# A sample config for converting an article into tweets

model: gemini/gemini-2.0-flash-001

inputs:
 - blog_content: str
 - tweet_count: int
 - tone_preference: str

outputs:
 - tweets: list[str]

Then, we can let Otaro optimize the prompts according to rules we specify. For example, we can create functions to check a tweet has fewer than 280 characters, has a particular tone, and has no hashtags, and then add the functions as rules to the YAML config file. Otaro will programmatically test several prompts and check if the respective results pass the rules.

# In the YAML config file, here's how the rules are listed:

rules:
  # Ensure tweets are under 280 characters
  - app.custom_rules.max_length(tweets, 280)
  # Ensure we generate the requested number of tweets
  - app.custom_rules.length(tweets, tweet_count)
  # Ensure tweets are in the requested tone
  - app.custom_rules.contains_tone(tweets, tone_preference)
  # Ensure tweets do not contain hashtags
  - app.custom_rules.no_hashtags(tweets) 

# In app/custom_rules.py, here's how the max_length rule looks like:

def max_length(tweets, max_chars):
    """
    Ensure all tweets are under the maximum character limit.
    
    Args:
        tweets: List of tweet strings
        max_chars: Maximum number of characters allowed
        
    Returns:
        bool: True if all tweets are under the character limit
    """
    for tweet in tweets:
        if len(tweet) > max_chars:
            return False
    
    return True

Finally, after Otaro programmatically tests numerous prompts to pass our rules, it will return the best prompt in another YAML file.

desc: '"Create {tweet_count} tweets from the following {blog_content} with a {tone_preference} tone. Ensure each tweet is under 280 characters and contains no hashtags."'
inputs:
- name: blog_content
  type: str
- name: tweet_count
  type: int
- name: tone_preference
  type: str

If you are using Otaro just to find an optimized prompt, you can copy and paste it into your app’s code.

But it is better to use Otaro in your app’s code. For example, I built a simple article-to-tweets app that is powered by Otaro.

Otaro will automatically apply the best prompt whenever my users click the “Generate Tweets” button. Even better, Otaro will always check the generated tweets against my rules. If a tweet doesn’t pass my rules, Otaro will automatically test more prompts and find a better prompt on the fly to produce better tweets.

This does mean my user might sometimes need to wait a little longer while Otaro is optimizing the prompt. But this is better than asking my users to keep generating, with an unchanging prompt under the hood, until they get a better result—which will be an annoying experience.

If you want to try Otaro, you can find our open-source repository here and our documentation here.

Jargon explained

I realized a lot of “AI” is often Python coding. I had never written any Python code (besides some beginner’s tutorials) until ChatGPT taught me how to write Python scripts. As I work more on AI, I have been exposed to more Python coding.

venv: While I have used virtual environments before, I never understood why until I built more things with Python and had issues running my code. The main purpose is to let each project have its own dependencies to avoid conflicts between different versions of the dependencies. For example, some programs require version 1.0 of Package A while others require version 0.9 of Package A. Installing Package 1.0 globally will cause the latter programs to break.

# To create a venv
python -m venv venv

# To enter a venv
source venv/bin/activate

requirements.txt is used to list the dependencies and their specific versions required by the Python program.

# To easily install the required packages
pip install -r requirements.txt

# To generate a requirements.txt for your Python program
pip freeze > requirements.txt

Uvicorn is again something I have been using because tutorials asked me to but I never questioned what it is until now. Uvicorn is a popular software server for Python web apps. It listens for HTTP requests, processes them with our backend (e.g. a FastAPI app), and returns the data to the frontend. It is like the bridge between the frontend and the backend.

# To start a Uvicorn server
uvicorn main:app --port 8000 --reload

Ruff: After enduring my terrible Python code for a while, my cofounder nudged me to install Ruff. (I think this might have been the second time.) Ruff can identify and even help fix issues with Python code, such as syntax errors and potential bugs.

uv run ruff check # This produces a list of issues
uv run ruff check --fix # This also fixes simple issues automatically

I added the commands above because even though I have been asking ChatGPT for them, it is a lot faster to know them by heart, especially since I use them often.

Interesting links

Windsurf: My Cursor’s subscription was going to be renewed this week. But after seeing multiple tweets praising Windsurf over Cursor, I switched. Windsurf does feel better overall, besides the occasional error wall.
- Varun Mohan, Windsurf CEO, explained why their context retrieval is better than other products.
- Kevin Hou, Windsurf head of product engineering, shared how Windsurf works as an agentic IDE. His last point on scaling with intelligence reinforces the idea that AI agents shouldn’t have fixed workflows but powerful tools.

OpenAI’s Responses API: OpenAI launched a new API this week to make building AI agents easier. I played with it this week while building Stores and found it a bit easier to use than the Chat Completions API. Here’s the story behind the new API. (It will be replacing the Assistants API next year.)
Why MCP Won: Continuing the MCP hype from last week, the folks at Latent Space explained why it has become so popular.