AI16: Building an AI workspace app
I'm looking for people who work in their desktop files, such as documents, slides, and images, multiple times a day to try our latest prototype!
While I have benefited from using AI in various aspects of my life, from general curiosity to writing to coding, most people around me haven’t been using the latest wave of AI technologies as much as I think would benefit them.
Even when we have ChatGPT for free, most people are not using it.
The issue lies more in the product design than in the users.
I was reminded of the early days of the internet and how things have changed. I was lucky to be a part of the dial-up era and grew up with the internet. But in the beginning, most of us didn’t know what to do with it. It was and is still a cool thing. But what should I do on the internet? Similarly, ChatGPT and other AI chat apps are amazing and can do so many things. But most of us still don’t know what to do with it.
Over time, developers built web apps and mobile apps that leverage the internet. Facebook, Slack, Zoom. Most of us use these apps without even thinking about how the internet made them possible. By this analogy, to enable more people to use and benefit from generative AI, we need to build apps that leverage generative AI without needing people to know how to use AI.
Using AI without knowing how to use AI
One of my favorite examples is Granola. It uses AI to enhance your meeting notes by adding details to your rushed notes according to the transcript and organizing them into sections.
Without Granola, we would:
Open up a notepad to prepare for the meeting
Try to jot notes while others in the meeting are speaking (usually in bullet points and short-forms with many typos)
Clean up the notes after the meeting
With Granola, we would still:
Open up a notepad to prepare for the meeting
Try to jot notes while others in the meeting are speaking (usually in bullet points and short-forms with many typos)
Clean up the notes after the meeting
But Granola’s AI would do the last step automatically for us. Based on the notes we have taken, it can understand what might be important to us about the meeting and update the meeting notes with information from the transcript.
What’s really impressive is that using Granola doesn’t require us to change our behavior at all yet we get better meeting notes. It uses AI but users don’t have to know how to use the AI, just like how Instagram, Google Docs, and WhatsApp use the internet but we don’t have to know how to use the internet.
Using AI while needing to know how to use AI
Contrast that with Windsurf or Cursor, two popular AI code editors.
They are incredibly powerful AI tools for developers to code faster and for non-developers like me to even code. I rarely pay for software subscriptions but I subscribe to Windsurf.
But it is like ChatGPT. To use it well, you need to know how to use AI. Even though you chat with the AI in English, there are still quirks around how to phrase your requests so that the AI understands and executes the task well. The chat interface doesn’t tell us what can be done while the marketing often tells us it can do everything. We wouldn’t know what cannot be done until we try them. Or it might be because we “prompted” it wrongly.
While I love these AI tools, we need to integrate AI into tools and workflows more seamlessly if we want more people to use AI. That said, I don’t necessarily think a chat interface is bad. We have already been chatting with terrible customer service chatbots for many years. Just like the Granola analogy, we didn’t have to change our behavior yet we now have better results.
AI in our workflows
For the past few weeks, we have been building and exploring an AI desktop app.
While ChatGPT and friends are amazing at answering our questions, we cannot use them to do things, such as creating and editing documents, resizing images, or organizing files.
ChatGPT and Claude have desktop apps but you can’t use them with your desktop files. Yes, you could use MCP servers with Claude’s desktop app but most people will not even try to figure out how to do that. ChatGPT’s desktop app can edit only code.
What if AI is embedded directly in our workflows so that we…
don’t have to copy and paste to and from ChatGPT all the time,
and can actually use AI to do things with our files for us,
without needing to know how to use AI
I made a simple dummy prototype this week to explore a possible interface:
In the left panel, you can see the folders and files you have granted access to the app.
The middle panel is the main workspace where you can preview and maybe edit files.
The right panel is where you interact with the AI agent to get it to work on your files.
We also have a functional prototype (which looks a bit different from the screenshot above) that can do things such as:
Research information online and create a Word document or PowerPoint with its findings
Organize files into folders based on the filename or content
Bulk rotate images or rename files
While these seem cool, it is the ChatGPT problem in another form. When the app can do many things, it isn’t clear what can be done.
I suspect it will be better if we focus on one core use case, at least initially, so that users know how to use it immediately. My initial hypothesis is that bulk actions, such as extracting information from multiple files into a spreadsheet or creating multiple research documents, would be valuable.
Want to try our prototype?
If you work with files on your computer every day, creating, editing, or moving them, I’d love to jump on a call and show you our prototype.
Or if you think you have a task that such an AI desktop can do for you based on what I shared above, I’d love to chat too.
Just let me know!
Jargon explained
Yield: In Python, you can use
yield
to pause the execution of a function, return a value, and then continue the execution. For example, in the code below, it will return 1, then 2, then 3, similar to how ChatGPT streams text one by one, instead of loading for a while and displaying the whole text. Usingyield
turns a normal function into a “generator”.
def count_up_to(n):
i = 1
while i <= n:
yield i
i += 1
Type hinting a function: You can use
Callable
to indicate that a parameter is expected to be a function. In the example below,Callable[[int], int]
means a function taking oneint
and returning anint
. (As a reminder, mostly for myself, Python does not check the type hints when the program runs. Type hints are used by our IDEs to warn us of issues.)
from typing import Callable
def apply_twice(func: Callable[[int], int], x: int) -> int:
return func(func(x))
Tauri 2.0: I built a Mac app with Tauri this week. I was using Windsurf and kept getting issues because Tauri 2.0 was only released last October while many LLMs were trained before that. So the LLMs kept producing code for Tauri 1.0. ChatGPT (o4-mini-high) gave me better code and fixes because it would search the internet and get more updated information.
Interesting links
The modern AI workspace: Why Cursor isn't just for coders anymore by
: Some of my recent ideas were inspired by this piece. But I do not think non-coders will use Cursor simply because it is a code editor.Welcome to the Era of Experience (by Google): Imagine AI that is connected to real environments so that it can evaluate its actions and adjust its plans. That might be the future.
ChatGPT o3 looked through 104 sources to help me find a quote that I vaguely remember. It iterated through 18 sets of queries and links until it found the right quote (which was in a podcast).
Our interfaces have lost their senses: An incredibly beautiful website by
exploring interface design, especially in the age of AI