Changelog & Friends — Episode 2
A new direction for AI developer tooling
Elixir creator Jose Valim discusses Tidewave, a coding agent for full-stack web development.
- Speakers
- Jerod Santo, Jose Valim
- Duration
Transcript(235 segments)
It's changelog, and friends, a weekly talk show about MCP hot takes. Thanks, as always, to our partners at Fly.io, the public cloud built for developers who ship. We love Fly. You might too. Learn more about it at Fly.io. Okay, let's talk. What's up, friends? I'm here with Kyle Galbraith, co-founder and CEO of Depot. Depot is the only build platform looking to make your builds as fast as possible. But Kyle, this is an issue because GitHub Actions is the number one CI provider out there, but not everyone's a fan. Explain that.
I think when you're thinking about GitHub Actions, it's really quite jarring how you can have such a wildly popular CI provider, and yet it's lacking some of the basic functionality
or tools that you need to actually be able to debug your builds or deployments. And so back in June, we essentially took a stab at that problem in particular with Depot's GitHub Action Runners. What we've observed over time is effectively GitHub Actions, when it comes to actually debugging a build, is pretty much useless. The job logs in GitHub Actions UI is pretty much where your dreams go to die. They're collapsed by default. They have no resource metrics. When jobs fail, you're essentially left playing detective, clicking each little dropdown on each step in your job
to figure out like, okay, where did this actually go wrong? And so what we set out to do with our own GitHub Actions observability is essentially we built a real observability solution around GitHub Actions.
Okay, so how does it work?
All of the logs by default for a job that runs on a Depot GitHub Action Runner, they're uncollapsed. You can search them. You can detect if there's been out of memory errors.
You can see all of the resource contention that was happening on the runner. So you can see your CPU metrics, your memory metrics, not just at the top level runner level, but all the way down to the individual processes running on the machine.
And so for us, this is our take on the first step forward of actually building a real observability solution around GitHub Actions so that developers have real debugging tools to figure out what's going on in their builds.
Okay, friends, you can learn more at depot.dev, get a free trial, test it out, instantly make your builds faster. So cool, again, depot.dev.
I got like the news. Like I got the update in the podcast that there was like an Oxide event and you were there. Was that a thing? Like how do you do that where you go like to the place?
Yeah, this was a first for us, I guess. Cause that's like an internal conference for their company. And obviously we are not internal to their company. So that's a first for us, but you know, we've hit it off with Brian Cantrell and with Steve Tuck, the co-founders. And they wanted us to experience the team and meet everybody. And we had been kind of ogling and gushing over how cool their server racks are for years. And of course I don't have enough money to buy one of their racks and neither does Adam. They're not gonna make a home lab version despite Adam's incessant cries for them to have an affordable, maybe a half rack. And so we've always wanted to like, see their hardware in real life. And so this opportunity presented itself. So we went, hung out, met a bunch of cool people and saw kind of inside their company, what they are up to, which was, it was cool. It was different.
Nice, very nice.
First time to Emeryville, you ever been to Emeryville, Oakland area in the Bay, Jose?
I don't think so. You don't think so? I don't think so.
Right across the street from Pixar, like the Pixar headquarters.
I actually didn't even know like Pixar was around that area, so.
I didn't either until I looked across the street and there was Pixar.
Yeah. Isn't like that area in Oakland where you're like, some startups are starting to move there. Is that the thing that you're saying or I'm getting things missing?
I don't know, honestly, I think so. But we're outsiders, you know, we get invited to the Valley from time to time.
Yeah.
And eventually we say yes.
You know, it's a cool experience, I think. I think it depends on the different places
we get invited to, but I think it's a chance to explore the world and meet some cool people, peel back the layers, tell some cool stories. So I favor the IRL. I think it's cool to do it a few times a year or as often as it makes sense, some version of that. We usually go to all things open, but our schedule conflicted this year. What about you, Jose? You get out and see the people over?
Yes, and I had kind of like, I was used to do a lot of that, especially at Elixir, like at the beginning, go to a bunch of different conferences and just talk about Elixir. And then of course, like at some point it gets very exhausting and then I ended up like just kind of like, okay, I'm going to focus on the Elixir community from now on. And even the Elixir community, like there is like the adjacent Erlang community as well that it's enough to keep a person busy, but now we've tied wave and we are like supporting, now like we support Phoenix and Rails, we are working on like Django and other frameworks. I have started to like kind of go back, for example, to Ruby conferences. So last week I was at Heruko, which is one of my favorite conferences. I don't know if you are, you were Ruby folks, weren't you?
So I'm familiar with that. I haven't been to it, but I'm familiar with most of the RubyConf's.
Yeah, so just, I don't know if you're alive already, but for the listeners, I talked about this in another place. What I really like about Heruko is that every year the people say, look, I want to host on my CD. They do like a three minutes, five minutes presentation and people attending the event vote where it's going to be next year. And it's usually somebody with like no experience organizing an event now has to like organize an event for like 500 people, right? 600 people. And it's probably very daunting, but I think it keeps it fresh and keeps it always like community centric, right? Because it's always moving around. So yeah, so I was at Heruko and then the Elixir events I'm going to, in two days, I'm going to the go-to conferences in Copenhagen. So yeah, I'm kind of back on traveling mode for now.
You like that or do you just do it because you have to do it?
I'm enjoying it right now because I think one of the things with everything that is happening around AI, right, and coding is that they are like, nobody kind of knows where it's going. So people tell you that it's going to go there, but nobody really knows, right? Like I think the CEO of Citroen Tropic made a prediction about like 90% in six months, six months have passed. Has not been 90% of code being written by coding agents. So I think like I'm enjoying a lot like this opportunity with like talking to different people and you are getting like a bunch of different takes and different ideas and things to explore. So it has been really fun just going out and talking to people, but I'm sure that I'm going to do it enough that at some point, like maybe six months, I'll be like, okay, I had my fail. It's time to hibernate again and go back to the Elixir conferences. But right now it has been really fun.
I agree, I think it's fun to step back for a while and become a recluse and enjoy your local world. And then to come out, peek your head out from underneath the rock and see the people again, there's something invigorating and exciting about it. But when you're just constantly on that track of just like travel, travel, travel, conference, conference, conference, it can tend to burn out. So I think everyone needs to step back, but then also step out and see some people because that's where the magic happens, isn't it Adam? I mean, that's where the real relationships actually form.
I think so, you know, the IRL is really where it's at. I heard that somewhere and I liked it. Then I experienced it and I was like, you know, just give me more, please, nonstop.
Well, trying to figure out where the AI thing is going. Clod 4.5 dropped today. I don't know if anybody played with it yet. I have not, but you know, better, stronger, faster, still not writing all the code for us.
I think I did use it today.
But they said something like it can go 30 hours on a coding bender. I just thought, well, that's really good marketing because I had no idea if it's true or not, but I was like, that's a great way to describe what you're thinking too, which is more than I can last. I don't know, Adam, how long you can bend or Jose, I'm sure you've been on some benders in your life, but 30 hours, holy cow, man.
Yeah.
So that would be stronger. I'm trying to think of where it fits in the category of better, stronger, faster, stronger, man, 30 hours straight. I think it doesn't lose context or something. I don't know. Did you read this?
Maybe they are doing more things. So right now, they do the auto-compactation of the chat and context engineering is all the rage right now, right? But they do the automatic compactation of the conversation, which is summarizing it. But something that they do is also that when the context is getting filled, they just prune the tools output from the beginning. So there is like some files or some searches or commands you ran at the very beginning of the conversation. They prune that. And that also allows them to go for long without having to summarize stuff. Because when you summarize, there's always a chance that you are losing some data. And something that is really something that I do it a lot is that it can actually have conversations with the agent about this stuff. So you can, like, so people, so a lot of people say like, oh, like there is the agent doesn't tell me like, oh, I'm using this agent and this agent doesn't tell me which tools you have available. But you can always just ask the agent, like which tools you have available. And then you can just say, look, invoke. So something that I do is like invoke like echo with, you know, a 100,000 times, three times. So I can force it to fill in the confidence name, like what you see now on that tool that was invoked. And then it's like, oh, the two output disappeared.
It's like, it's surprised that stuff like itself, like it now says this thing.
And you can trick it like to crash very easily as well. Say like, so you're like, oh, why you have this tool? Like, wouldn't you think like this tool would be better? And then you just say like, would this tool be better? It imagines that the tool exists. And it's like, yeah, it would be better. Let me try calling it, but it obviously doesn't exist. And then the agent crashes. So I like like having those meta conversations and they get like surprised or tripped up.
Yeah, it's fun. It's almost like talking to a kid, you know, it's just very easy to pull the wool over our kids' eyes. And they'll just, they're just gullible because they don't have life's experience that we do. And you can have a lot of fun as long as you're, you know, keep it in good-natured fun and not trying to actually trick a kid. But with your AI, who cares? It's a robot, trick it all you want, Jose, you know, get it to do all kinds of stuff. Have you heard, we just learned this from Faras that people are actually using prompts in their malware now. So if you can get arbitrary code execution on someone's computer, for instance, this was in the case of NX, which is like a mono repo command line tool. And they hacked the NX NPM package, distributed some malware. And if you're running the NX command and you're infected, in there was an actual prompt to ask Claude code to do stuff for it instead of like, you know, instead of coding it out. Yeah, and it was really kind of smart because what they asked us to do was the things that are kind of fuzzy finding for humans, which is like, or for programs, I should say, which is like find all the interesting files on this computer, which of course you could have a list of where the interesting files are and you could like search certain things, but you can, but Claude code can just go do tool calls and read stuff and just hand back a list of interesting hackable files, you know, like secrets and whatnot. Anyways, I just found that to be amusing.
So even the attackers are getting lazy to write their own code, that's why they're trying.
Exactly, like why? Like this is the promised land, isn't it? You know, you don't have to code anymore when you're hacking someone's computer.
Yeah, I wonder why that was the best route. Was it because of their laziness or their lack of desire to write that script or just because they were just trying to leverage, you know, a Claude code enabled developer's machine? Like, what do you think the true psychology of that choice was?
I think they're just thinking this is the fastest way to the best result. You know, just like most programmers are like, well, what's the fastest way to the best result? Well, I could write a program. And besides, I only have so much stuff I can shove in, I'm assuming, like the more stuff you put in, the more likely are you to be found. So maybe some compression is in there. But it's like, if I could just prompt something to scour your computer for interesting files, that's a lot, that's pretty good at it. That's a lot faster than me having to write a program that scours your computer for interesting files. That's my guess, I don't know what you're gonna say. Why do you think somebody might do that? Maybe they're just showing off.
Yeah, I don't know.
They just want to trick a computer, you know? They want to trick an AI. Yeah. They're in nefarious deeds. So, okay, you like to mess with them. How much value are you getting? Because a survey says that we're getting tons of value, but quantified research says that we're not. I don't know if you've read any of the research, but a lot of recent papers, a lot being at least more than one, have come out and said developers think that they're more productive with AI coding tools, but it's actually slowing them down. What are your thoughts, Jose?
Well, I think I have many thoughts on this. The first one is that nobody's surprised we're really useless at estimating stuff. But of course we are estimating this for sure. We've been proving that for years, haven't we? Yeah, so of course we are estimating things wrong. And I feel like the fact, like people call it like with this exaggerated, like, oh my God, like I'm three times more productive or even twice. For me, it's just kind of pointless because if actually, if you're even like a third more productive, 33%, that's like kind of like massive, that's huge, right? And then I think people fail to consider, like there are other studies where like developers with, I don't remember the exact number. We spend like 50% of the time coding, let's say, right? And then if you're using agents for coding, of course, how more productive you're going to be is going to be where you're using agents. And if you're only using agents for coding, you can only optimize at 50% and not all the other things. So all that said, yeah. And then the other thing is that we, a lot of people, they don't estimate, they don't consider the time that they'd lose when something doesn't work, right? Everybody's happy like, oh, I used the agent, it worked. I was super productive. But there are a lot of times where it's just, it's not productive. And then you ended up like trying to coerce it to do the right thing and then it doesn't do it. And then you quit, right? And then you try it again and then it works and you completely forgot about that bad experience. The bad experience is actually one of the reasons why I never liked the completion, like the AI completion suggestion, because I would read it and if it's not what I want, it would always throw me out of my loop. And like that time where I'm like, I read it and then I'm like, ah, damn, I lost my flow. Like, how do you measure that, right? If you're only measuring like, oh, it was accepted, you know, two thirds of the time, but like the one third was so disruptive to me that, you know, like, so with all that said, I think that I get a benefit from it, right?
I think.
And I think.
There's my six caveats, but I still think I do.
Cetacean needed, right? I like, I joke that I would love like the, we could use the Wikipedia cetacean need. It should like be like an HTML feature. We should just be able to put that everywhere, you know, like in conversations.
And after every sentence that I say.
Yes. Yeah. Because the other thing is.
It's Rigby. Is that Samsung's thing? Silicon Valley. Oh gosh. Sorry. Oh yeah. Continue. You know I was not gonna get that. You were hoping Jose is gonna get it, weren't you?
No, yeah.
Yes. All right, well, some people have got it.
Yeah, we can talk about Silicon Valley later, but yeah. So the, yeah, you just did the AI completion.
He just auto completed the wrong thing.
Perfect.
In the reels. In the reels. He lost his limit. Okay, he's back.
Context switching. So, so there are a couple of things that I do that I think like you have to find where it works and where it doesn't work. And of course it's going to change as those things improve. So for example, I tried it a couple of times to help me work with elixir type system stuff and it doesn't work. It's going to be useless. I'm not going to try again. Maybe, maybe six months, maybe in a year, things change enough that it can help me with that kind of work, but I don't feel it's there. But for example, when working with Tidewave, because it support other web frameworks, I often implement the feature in elixir.
Tell the people what Tidewave is real quick so that
Yeah.
The three of us know, nobody else knows. What's Tidewave?
Yes, so Tidewave is a coding agent for fully stack web applications. Okay. I'm going to summarize it. We can jump into it later, but the idea is to have a coding agent that is tightly integrated with your web framework. So we understand like what is on the dome and how that maps to a template. It can coordinate the browser. So it gives it a really strong verification loop. So as you ask it to build features, it can verify that features work. You can interact with the actual web page and ask changes on the page instead of asking for changes on the code. I have like this whole idea that I think we should run coding agents on top of what we produce. So if I'm working on a library, what I produce is like API docs, fine, run that in an editor. But if I'm building a web application, I want to run the coding agent in the actual browser because I want it to understand what I produce, right? And I want to be able to interact with how to produce because if I can't do that, we are doing boring translation work all the time. I like looking what happens in the screen, go to the editor, ask it to change things, right? And then the agent says, I'm done. You reload the page, there's an exception. You have to copy and paste exception back to the agent. Like you don't want to do this boring stuff, right? And I say like the data science folks, they were the first ones to notice that because they were the first ones to put the coding agents inside notebooks, right? And you're like, okay, let's run this thing inside a notebook because if it understands my variables, if it understands my cells, they're going to be more productive. But nobody caught up to that trend, right? We kind of regress, we first put in the editor and then we put it in the common line, right? Like, you know, so we should be going up, right? So that's tied wave. And I do think tied wave can help you like be more productive with AI because being able to, allowing the agent to verify what it builds is going to make it so it builds better things, things that are guaranteed to work and are going to spend less time on that loop, right? So when I'm working on tied wave, like we support Phoenix, we support Rails, we are working on Django, Next.js and a couple others. I usually implement the feature in Elixir, tell the agent like, hey, I implemented this feature in Elixir. Now do, then I go to the Rails project, implement the same thing. And I'm like, there are a couple of things that I tell, it's like, don't add tests, don't use mocks, right? So there are some threats in there, but I just added-
Wait, you say don't add tests and then you say don't use mocks? I mean, if it's not writing any tests, why is it using mocks?
Sorry, don't add additional tests, sorry, than the ones I wrote because the Elixir PR is like, it's good, I wrote it, right?
Right, it has a test in there. So it's copying those over.
It has like the proper tasks, yeah. Because it tends, I think my experience with coding agent for coding is way better than testing because testing stands it, not in Elixir, but cause I'm doing a lot of Ruby and Python, it tends to use mocks a lot and just writes a bunch of redundant tasks. So it's a whole separate discussion, but it's really good. Like when I ask it, like get this PR here, translate to this repository, a lot of the time it's just perfect. It's done, it runs the task, it runs the Linter and I can just push it, right? I send a PR, people review, so that's really good. So I think that that's one of the things you have to figure out where it works and where it doesn't, take notes of that, right? And find the loops and tools that make it work for you. And then you can be, it's like any other tool. And I think AI has this particular problem that some people say like, oh, it's just magical. It's kind of like a lottery, a lottery in some sense. Like some people go try AI and because it's, it's probabilistic, they get a bad experience and they're like, oh, this sucks and I'm going to try it again because people come with the expectation that it's just going to work. And then some people, again, like you're trying it for the first time by just the randomness of it. They have a good first experience and then they start investing on it and refining it, right? And that's the process you do have like to figure out what is there and what isn't. And then the other thing that I like, I tell people to do, which works really well for me is to, I don't correct the agent. I just like, if it does something wrong or if it's like 78, 80% good, I just go and finish it. That's fine. It depends. So I do two things. So imagine that I ask it, does this thing? And then I leave, I come back like, oh, this sucks. I'm not asking like, oh, you were supposed to do this instead because often when it does something wrong it's there in the context, it has a really- It's going to keep getting wrong. Yeah, and then when it fixes, it doesn't fix everything. So when it does something wrong, I usually go and like, I start a new chat. I just discard everything, right? Nobody's going to be upset. I just discard everything. It's like, okay, you start again, but do this, this and don't do that. I add a little bit more of context and then if necessary, I start again.
So you start fresh with additional little warnings or instructions.
Yeah.
Adam, you do the opposite, don't you? You never write the code. You just keep telling it to do stuff. Yeah, I don't really.
Yeah, I think by and large it's writing code.
I can't really write any, you know, myself anyways. So it's like, it's going to do a better job than I'm going to do. Jose is a better programmer than both of us. So he can just fix things. We're just different people, you know? Yeah.
So what are you using it for?
CLI tools, really. I'm having fun with like a Proxmox CLI where it instantiates like a virtual machine with a given cloud on it image. And it's like a command line away, basically. It's cool. So I can spin up a new server immediately, essentially. I can package it as a server. You can share it with me as a Git repo. It's kind of cool. That, and I would say 7zarch, which is a, you know, 7z is the compression algorithm. So I was just working on a version of that as a CLI that's just cooler, basically, because 7z's, its existing command structure is just kind of like not a lot of fun.
It's hard to remember. I can always forget it. It's highly configurable. And so I wrote something that was just more fun.
So does it wrap it and then call it underneath the hood with specific flags?
Yeah, essentially. It did that for a while. Until then it was like, we essentially just like,
we rebuilt something called lib7z, which is a wraparound. I think it is like Rust 7z2 is a crate out there. And so it's like, it actually acts as a library around 7z, essentially. And then you can write a CLI layer on top of that because it's a library. So that's where it's currently at right now.
That's cool. So you don't have to actually shell out. You're actually re-implementing the functionality precisely with a Rust crate.
And you get a lot more data in that API as well.
Like you get a lot more granularity around like files and process and progress, and you can control all the UX around the CLI that way.
We deal with a lot of large files and folders.
Yeah, so I'm just sort of a neighbor by archiving them very well.
Archiving it to the best of visibility.
Yeah.
Well, friends, it is time to let go of the old way of exploring your data. It's holding you back. But what exactly is the old way? Well, I'm here with Marc Dupuy, co-founder and CEO of Fabi, a collaborative analytics platform designed to help data explorers like yourself. So Marc, tell me about this old way.
So the old way, Adam, if you're a product manager or a founder and you're trying to get insights from your data, you're wrestling with your Postgres instance or Snowflake or your spreadsheets, and you don't maybe even have the support of a data analyst or data scientist to help you with that work. Or if you are, for example, a data scientist or engineer or analyst, you're wrestling with a bunch of different tools, local Jupyter notebooks, Google Colab, or even your legacy BI to try to build these dashboards that someone may or may not go and look at. And in this new way that we're building at Fabi, we are creating this all-in-one environment where product managers and founders can very quickly go and explore data regardless of where it is, right? So it can be in a spreadsheet, it can be an Airtable, it can be a Postgres, Snowflake, really easy to do everything from an ad hoc analysis to much more advanced analysis if, again, you're more experienced. So with Python built in right there and our AI assistant,
you can move very quickly through advanced analysis. And a really cool part is that you can go from ad hoc analysis and data science to publishing these as interactive data apps and dashboards
or better yet, delivering insights as automated workflows to meet your stakeholders where they are in, say, Slack or email or spreadsheet. So if this is something that you're experiencing, if you're a founder or product manager trying to get more from your data or for your data team today, you're just underwater and feel like you're wrestling with your legacy BI tools and notebooks, come check out the new way and come try out Fabi.
There you go. Well, friends, if you're trying to get more insights from your data, stop resting with it, start exploring it. The new way with Fabi. Learn more, get started for free at fabi.ai. That's F-A-B-I dot AI. Again, fabi.ai.
I also use coding agents for things that I'm not reviewing, particularly for like prototypes. And that part has been really fun because if you're working on a product, you have ideas of, wait, what could this project, which directions it could go in the future. But usually before you would like think about it, put on some notes, right? And then maybe if you're lucky in two, three months, like somebody from the team can take a look at it, give feedback, right? And now with agents, you can just say, okay, go for it, implement this thing, right? Like, and so as I was saying, like I have this idea that coding agents should run on top of the thing that we produce, right? And we talked about Tidewave web that works for our applications. I talk about notebooks. Well, but if I'm working on the game, right, I want to have Tidewave running in the game engine. If I am building a mobile app, it needs to know about the mobile device, simulators, right, and all this kind of stuff. So I was able to, like, I think for four or five weekends straight, like my, what I would do during the weekend is to come to the computer from time to time, see if the agent was working and just have it build like a different proof of concept of embedding Tidewave somewhere completely different. Like, oh, what would a Tidewave browser extension look like? Which capabilities we got from this? And, you know, like this, doing that when I was, I had to do this kind of things for other products. Like we were doing that for Lifebook. You know, it would take a really long time to validate all those things. And I could very quickly explore something different, get the lessons learned, and provide a way better blueprint for the team to work on.
Do you run it in YOLO mode or whatever is the equivalent where it's just doing whatever it wants to do and you come back every once in a while? Yeah. Yeah. Yeah, totally. So do you have on it, have you considered like a notifier, and text me when you're finished kind of a thing? Otherwise you're gonna keep coming back. Are you done yet? No, it's not finished. Oh, it's been done for two and a half hours, but I was watching TV.
Yeah, so in this case, cause it's the weekend, I don't care in the sense that I don't want to be also interrupted, right? Like it's not my priority.
Gotcha. So when you feel like it, you go over and check it.
Yeah. Otherwise it's like, yeah. Otherwise I'm using the notifications, right? I use Zedd a lot and Tidewave and they all have notifications. And then I'm kind of like listening, I'm waiting for them.
Oh, they don't like push to your phone or anything?
I don't think Zedd, Tidewave doesn't. I don't think Zedd doesn't know.
Cause then you don't have to wait and listen. You can be out on a walk or whatever and be like, oh, it's done. Maybe even like.
Yeah, but if I'm on a walk.
Give it it's next task, you know? Yeah, right. You don't want to walk in?
Yeah, like, so it's funny because I talked to Chris McCord about this, right? And it's like, oh, maybe I am on a coffee. I'm out to get a coffee. And then I'm like, look, if I'm out to get a coffee, I'm out to get a coffee. You know, it was like, if it's done, I don't care. I'm out, you know, it's.
Right. Yeah, same.
But I do.
But three, maybe you lose three hours of productivity, man. I mean, all you gotta do is tell it to keep going. You know, it's trade-offs. I get it. It's the weekend. I like to unplug as well, but I don't do any of the stuff that you're talking about. I don't have anything coding for me over the weekend. So if I was, I'd at least want to like, you know, be a good babysitter, not a neglected, not a neglecting babysitter, but to each their own, I guess. So you're talking to Chris. It sounds like you and Chris, are you guys competitors now? I mean, doesn't Chris have phoenix.new and isn't this like, there can be only one, Jose.
Right. So I. Yeah, and that's why he's not coming on the show anymore. No, I'm kidding. So no, we do talk, talk a lot about those things and like we are still bouncing many ideas of each other. So the way I think about this is that I think there's a very easy way to separate those things is that phoenix.new is remote. And I did not. So maybe we should go deeper in Tidal Wave because there is a bunch of additional context here. So as I was saying, like Tidal Wave is a coding agent for full stack of applications. But the thing is that it runs on your machine, so it's not, so my whole, so one of my ideas that we are looking at like boat.new, lovable.dev, right? And they have all those things where you can click around, ask it to do changes, but it's like, they want to kind of own your code. They want you to be responsible for a code. And most of the times it's like for it's front end or for like react apps. And then I'm like, I want that for my Phoenix app that I run on my machine, right? Like there are, so there's like this, like a lot of people are pushing, oh, AI and those app builders that are running on the cloud, right? Tidal Wave is, you know, you are accessing local holes. So the way you would install it is that you would add the Tidal Wave package. So today it's like for Phoenix or Rails or in the future for Next, Django. So you just install the package. After you install the package, you go to your application, local host, whatever, 4,000, and you do a slash Tidal Wave. And then the agent is running there like in the browser and your web app running on the side. And now you can do all those things. You can go to the inspector, click it and say, hey, I want you to, on top of this element, I want you to add a chart of the most listened podcasts in the last month, right? So you can be very UI driven and everything is running locally. So for the people who are finding, look, I want you to have phoenix.new be responsible for my code, for my deployment. I don't care about that. And I want that thing to do everything for me. Then go use phoenix.new. I still think it also like owns the getting started experience. It is the best way of getting started with a Phoenix app, right? Just go put things in the prompt. It's going to build something for you that you can throw it out, right? And for me, I'm like, look, okay, I have my own thing. I already have my own infrastructure, my own development, you know, like cycle. And I want to incorporate all those tools into what I do every day. For me, it's like, you know, when there was like a trend, everybody was saying like, oh, you're all going to be developing on remote machines, you know, like, and then there was like those developer containers and that never really happened. Right?
I remember we did the show, didn't we, Adam? We did the show with whoever it was like GitHub code spaces.
Cloud development environments, essentially. Yes. Git container dot dev containers. Yeah, I know some people use it. It is used and you can use it locally, but it's not like everybody, because people would say that everybody would use it, right? Like, why would you have a local machine, right? So I see it the same way. I want those tools for my framework and running on my machine.
Okay, I am with you. I actually, when I heard Tidalave runs in the browser, I was like, another browser thing, Jose? Like they're all running in the browser, but actually it's different than that, right? It's in the browser because that's what your output of your web app goes, but it's in your local browser running against your local web server with your local environment and helping you build cool stuff right there, which is kind of how I develop now anyways. Whereas Phoenix.News was making me go into the browser and have a remote browser session, which I've always got excited about for the hour that we do the show. And then when I go back to my real life, I just don't want to do that. I want to be on my local machine. I always have, maybe I always will. I'm getting old, so I'm getting stuck in my ways. So that makes me like Tidalave a little bit more than when I first thought, oh, it's, because one of my questions for you was gonna be like, why the browser, you know? But it's cause I didn't understand.
Yeah, and the thing is, so we actually went through many possible designs. So over the last month you had, we already had like, let's talk a little bit about the browser design. So like we already had for some time, the Playwright MCP. So somebody may be listening to these and say, well, I can use VS Code with Copilot and install the Playwright MCP. Recently-
Chrome DevTools MCP.
Chrome, yeah, they released there. And I think yesterday cursor browse came out.
What's that?
It's just controlling Chrome. It is like the Playwright Puppeteer MCP. It's just built in, right? And the issue with those tools is that it is a separate browser session. It's not the one that you are developing. So imagine, for example, that we are, you are working on a project manager and what you need to do is you need to implement the feature for transferring a project between two organizations, right? So in order for you to implement this feature, you need to create a user, create two organizations, probably make sure that the admin is, the user is admin on both organizations, create the project, and then you can transfer it. And then a lot of the times the MCP is going to get stuck just in this process. Like a lot of the times MCP cannot create an account because creating an account requires sending something to an email that the MCP doesn't have. So now we start writing like those back doors for tasks. So there's such, there's a big amount of work, right? And then the fact that we run in the browser, we are literally running in your browser session. So because when you're going to develop the feature, right? You open up the, you are already logged in your development version. You go to the page already. And when you're going to validate that the feature works, you already have all that set up. So because it's running there in your session, everything that you do for development, the agent can do. And the agent is going to verify the things in front of you, not in a separate session. And then you can actually have a back and forth. Like if you're using the MCP, imagine like the agent is like, okay, let me test that it works. And then the MCP with the separate browser is running. And then you see a bug right when the thing is testing, how you're going to debug that, because it's a separate browser. How are we going to click things and say, hey, you would have to go around, say around this page. Maybe there is a bug here. With Tidalave, it is your browser. I think that's the most important thing. You can stop the testing, go with the Tidalave inspector. There is a bug here, fix it. And we also go the next step, which is that we integrate with the web framework. We understand, like when you inspect like a DOM element, we know the DOM element and send it to the agent, but we also know where in the template or which React component that thing came from. And we send that to the agent. So you don't have to do the manual working of figuring that out and do it to the agent. When there is an error page, we detect the error page of all the web frameworks and support, automatically feed that to the agent. So it's really meant to be like, look, it's like you, the agent, the browser, the web framework, like in a shared context, kind of everybody can see what the other is doing because otherwise it becomes our responsibility. You are the ones who are like getting information from all those places and pass it around.
Sounds pretty cool, man. That sounds pretty cool. Bypassed a lot of that stuff. I mean, the fact that it has, you know, that, I mean, that's something that I just, I've always wanted, I guess. I mean, you're going back and forth like that. It's better to do it right there real time. I haven't played with it to know the UX really of that. Like when you're filling with, let's say, is it a button? Maybe it's not working properly. What is the back and forth with the experience? Can you speak to it? Can you type to it? Like what are some of the interfaces you can think of?
Right, so I think there are three ways that you are interacting with it. So one is the usual chat prompt, right? And with the difference that we know what is the page that you're currently looking at. So you can talk to the page in the sense that, so for example, imagine that you just boot up your dev instance, your database is empty. You can go to a page that is listing all the podcasts, like for changelog. You can say, oh, this page is empty. Add some podcasts. It know which page it is at. So it can find information from the controller or from the live view and then say, okay, that's the data I need. Like it gives you an entry point. So that's the chat you can cut. It has the contents of the page. The other one is the inspector. It's like the browser inspector, but so you can click it and then you can mouse over elements. We show the DOM element. We also show like which templates or which Phoenix template it came from. And then you can like click it to open your editor or you can click it to ask the agent to do something. And the other way that we interact with it is when we detect that something goes wrong, we just show up a pop-up like, oh, you want to fix it, right? And then you can just click a button and have it fixed for you. So I think as a human, those are the, I may be missing some, those are like the three. So it's a very classic like chat experience with a few like things on top, like inspector, the error, but I think a lot of the part that we shine is in giving more tools to the agent, right? So the agent can do everything that the code in agent can do, but it can also run JavaScript on the page. And that's how the agent can test that it implements something. So for example, one of the coolest features that we use Tidewave to implement, like if you go to tidewave.ai today, we have videos in the homepage. And so I added like the YouTube URLs or not YouTube, like the URLs for the, I added the video tags, right? And then I wanted to make it so as I was scrolling through the page, the video started to auto play. So I asked the Tidewave to implement this, which it can do is like, it's a straightforward feature. I can't do it, but I assume it's a straightforward feature.
I didn't look at the code, but I'm sure it was pretty easy.
Yeah, so it implemented the thing, right? It implemented the thing. And then in order to pass that it worked, it actually like reloaded the page. So this Tidewave, it wrote JavaScript code to reload the page and scroll to the first video. And then it runs on JavaScript to validate that the first video was playing, but not the other two. Then it automatically scrolled a little bit more. So the second video started playing and then it ran some JavaScript to make sure that the second video was playing and not the other two. Right, and I think that's the important part, because if you can see the agent doing that, because if the agent doesn't do that, there is a chance they get it wrong, right? And then if they get it wrong, who is paying the price to fix it? It's you, because you are going to be the one who test it and then you have to go and tell it, right?
Well, I thought you were gonna say your users, because you're gonna push it out live.
Awesome, could be, could be.
And then your users have to tell you if it's broken or not. Yeah. How do you limit it to the viewport? Like I assume the scrolling is either simulated or it's real or it's simulating it so you think it's only scanning with you.
No, it just runs JavaScript on the page.
So what's in the viewport? It's looking at what you're seeing essentially.
Yes, yes, it's running in your browser. Like there is no, there are a lot of complexities in there, but like this part is like as straightforward as it can be. It can control the page. So the same way, because that's the thing, like people are coming up with all those different APIs to have the agent, like there's an MCP with 30 different commands to control the page. I'm like, it knows JavaScript, it knows the DOM API, just have it run things on the DOM. It knows what is, I don't know what is the command, but it knows what is the command to say, hey, it's Chrome a little bit, right? It knows. Right. So the only thing is like, we had like to intervene like very little, like so there's one of the things that it can't do like resize the browser window. I think it's because browsers don't allow you to do that because of security concerns or something like that. So there are some things where we have to intervene and add like extra capabilities, but it's just running things on the page.
Well, friends, you don't have to be an AI expert to build something great with it. The reality is AI is here. And for a lot of teams that brings uncertainty. And our friends at Mirror recently surveyed over 8,000 knowledge workers. And while 76% believe AI can improve their role, most more than half still aren't sure when to use it. That is the exact gap that Miro is filling. And I've been using Miro from mapping out episode ideas to building out an entire new thesis. It's become one of the things I use to build out a creative engine. And now with Miro AI built in, it's even faster. We've turned brainstorms into structured plans, screenshots into wireframes and sticky notes, chaos into clarity, all on the same canvas. Now you don't have to master prompts or add one more AI tool to your stack. The work you're already doing is the prompt. You can help your teams get great done with Miro. Check out Miro.com and find out how.
That is Miro.com, M-I-R-O.com.
Can it take a, I wouldn't say a fixed width, but a desktop design website and implement it. So it looks where you want to look on desktop. And can you say, make this a progressively, what's it called? Not progressive web enhancement, responsive web design. There we go. Can you make this responsive for these six viewports or something?
Right. Not yet. I knew it.
I knew it.
I got you. Because the resize thing that I told you. Because the resize thing that I told you.
It's not because it's Jose's fault because I don't think these things can do that anymore. It's hilarious though.
I love it. This pursuit of rightness. Oh, I'll try. I'll actually try it. Because I've been using,
I've been doing a lot of front end lately and I'm not good at it anymore. I'm learning. All the new tools are fancy and they're hard to use. I can't figure clamp out. I mean, I've been using clamp wrong for weeks now. Finally starting to get it to work. And none of these tools can do it either. So I play what I call a LLM Russian Roulette. So I take the same prompt and I'm usually, I'm like, hey, can you do this thing in SVG or whatever? Like I'm trying to accomplish stuff that I don't think is possible. And I thought it should be possible. It's the modern web, you know? And so I ask, chat GBT. I ask Claude. I ask, I'll even ask Grok if I get too angry. And then I'll ask Gemini. And they all give me different responses that are all wrong. None of them can do it. I want one that just tells me, actually Jared, that's not a thing that you can do. You know, like you can't do that with web technology. They're not going to do that. Cause they want to make me happy. But I know that like that kind of stuff, we're not there yet, man. I'm just doing way too much work in the browser as a human right now.
Here's how I would try to implement it. Okay. Okay. And let me know if that's an approach that you try, because if that's an approach that you tried, then my solution obviously is not going to work. So. I'll let you know, trust me. Yeah. So. I hope I haven't tried this. So I was talking about the resize. That's something that we identified recently. So we haven't implemented it. So it doesn't have currently the ability to resize, which means that it cannot validate responsive designs. Right? As simple as that. But so I would try doing would be, is add the feature to resize and the feature to take a screenshot of the page, which there are some other complications because the browsers don't allow you to do it for security reasons as well. We need, I know how to solve it. It's just going to take, I'm just explaining why are not going to have this feature tomorrow. There's some work. And then have it look at the screenshots and see if you can see things are good or bad. How do you think about this approach?
In my experience, their ability to look at screenshots and decipher things is really bad.
It's not there.
Like they have vision, but it's not precise enough, you know? And so I haven't tried that specifically, but I also don't think it's going to work. I would love you to try and prove me wrong. I would love to be wrong. But in my experience, when you pass a screenshot or you say, take a screenshot and then inspect the visual, nine times out of 10, they're wrong. All of them.
So. I wonder if we could, if we could use accessibility API's.
This is Jose. The guy is such a problem solver that he like can't help himself right now. He's like, let's, let's debug this thing.
So you're already, you already saw me like getting off track with the AI suggestion, we saw it live. So this is a also a real life nerd sniping happening
right here. Yep. We're shaving a yak. So yeah, what we're going to say? Accessibility API's.
If we could use accessibility API somehow to measure like size of elements and what is visible, what is not, but I maybe, maybe not.
Right. Yeah. I don't know about that. I just get angry and I just do it myself.
Because it goes back to like, to what I was saying in the sense that the way for us to eliminate the AI guessing is adding more verification tools. So if browser had had a way, if the browsers could tell me like, oh, the phones here are too small. These things are clipping. That's why I was thinking about accessibility API's. Because if the browser tells me that, then I can get that thing, which is going to be better than a screenshot and send it to the agent.
That might actually work.
Right. But I don't know. I don't know if this accessibility API exists, right? So that's why, that's why I'm.
Well don't ask no one though. I'll tell you that it does exist. That's right. And they'll give you the code. Emphatically. I love when I have it, they produce SVG and I'm trying to get like a tapered border and all this kind of stuff. And they're like, here you go. And then they like tell me all the reasons why it's going to look good. And I put it in there and I put it in there and I'm like, dude, it looks like a bow tie. Like you just drew a bow tie. And it's so far off that I have to laugh because otherwise I'm just going to cry and just be like, why am I even wasting my time with you guys? So there's certain things where they just have these inadequacies and they're all inadequate at this point in my experience. I haven't done 4.5 yet. So maybe after this call, I'll go see if Claude can do this. But I don't know. I don't feel like I'm pushing the envelope. I feel like I'm a kind of an interesting intrepid person trying to get something done and thinking that you can do things that maybe you just can't even do in the browser right now. But I think being able to develop out a simple, I'm not going to say fixed with desktop styled website and say, make this responsive. That should just be a thing. Don't you think? If you build that in the highway of Gils-A, people are going to line up with their money.
I think so. I mean, cause that's the thing. So I don't want to do that work, right? So if I hope that AI can do it.
Yeah, totally.
That's the thing.
Like I can also do it, but it's just slower and tedious. And that's what the promise is. We don't have to do this stuff anymore. And I'm not a good at it anymore. It's just, it's guess and check. I got to guess and check. It's still too big. Now it's too little. All right, I'm done complaining. Adam, take us somewhere else. I'm just getting, I'm airing my grievances.
One thing I was going to go back to was, Josie, I think one thing you were mentioning was how when you scroll tywave.ai,
as you see these movies come in, I'm actually back, I think maybe 15 minutes potentially, but you were describing this page here. And now that I've actually caught up and I'm scrolled it, maybe that's where we can go is like, what was the aha moment here when you did this? Cause you said you were kind of going back and forth. Did you not do any of this design yourself? Did you just sort of prompt it? What was the experience like for getting this page to be like this?
Yeah, no, for this page in, so for this page in particular, we were just doing the design of the page and then we knew we wanted to add all the scrolling and then we just asked it to do it. And then it did it right. And, and I think the, I think what was surprising about that is because, I mean, it's obvious, but that's exactly how-
The autoplay of the videos was key, right? Autoplay video, but it wasn't the autoplay, it was how it tested itself to know that it got the autoplay right.
Yes. And that's exactly how we would test it. I mean, it's obvious that the way you test the autoplay scrolling is by scrolling.
If you scroll and you watch it, autoplay, and you make sure the other ones aren't, but it's just running JavaScript.
But it's really nice to see it happening by itself, right? And then it goes back to other stuff, like Tidal Wave has access to everything. Another way that I like to phrase this is, we imagine like you're working with somebody and somebody sends a pull request and then you open up like the work they did in the browser. And then they're like, wait, this looks bad. And then you go back to the person like, did you look at it in the browser? Did you just try it out? And then like the person says like, no. And then it'll be like, what? Like you have to test things in the browser, right? Or like I use the Repo all the time as well, right? It helps me develop a lot, but we are asking coding agents to develop without the proper browser, without the Repo. So Tidal Wave give all those things as well, right? So, oh yeah, I was talking about like the, you asked about what are the user tools. And I started talking about the agentic tools. So one is coordinating the browser, but the other one is that we also give access to a Repo running inside your web application because we use the Repo for development. Why we're not giving one to the agent? Like I would be a worst developer if I didn't have a Repo, right? And then we have MCPs for like, oh, you can install an MCP to talk to Postgres. But I'm like, my web application already knows how to talk to the database. It already has all the credentials in there. Why are you asking me to configure a separate thing? So a lot of the times it builds a feature and then it tests the feature in the browser. And then it does a database query to make sure that the change also happened in the database. So that's kind of, yeah, so we're going back 15 minutes, but that's closing the loop of like, what are the tools that the agent have? And the whole purpose is to make sure they're producing something that is really good. And I'm not going to waste my time, like telling it obvious things like, oh, the video actually doesn't play. Oh, the change was not actually saved to the database.
You mentioned, but I think one thing you mentioned in there was MCP servers.
Have you, Jared, messed with MCP servers at all? I really haven't personally.
Mostly just Figmas.
Yeah.
And I told you my experience with that, which was nobody's fault, except for the state of the art is not quite what it needed,
but. Jose, I imagine you're probably playing with them heavily. How exactly does that fit into your flow?
Because from what I understand, it just adds more tooling to the context window, which is already kind of small. And so we're always battling that, you know, that auto compression or just having to refresh the entire chat whenever you feel like it, I suppose. How do you work in those kinds of tools into your workflows without, I guess, building the context?
I actually have a hot take in here. Okay. It's not a, it's not a, it's not a unique hot take, but so to answer your question, which is going to kind of reveal the hot take, is that. He's just teasing it.
He was not going to reveal this. He's just teasing the hot take. Stop setting it up, Jose, and give us the hot take.
So like almost all of our APIs is write code. Oh, you can execute code in the context of the web application. You can execute code in the context of the webpage. That's it. Like we are doing all this dense, like, oh, I already, I set up all the database, right? Like, oh, I'm going to have an MCP for the database. No, my web application already knows I should have the database, just use that. Oh, I want to have an MCP to talk to GitHub. And I'm like, well, I already have, like, I already logged in on GitHub in the browser. I already have the GitHub command line, use that, right? Like for coding agents, we are even going as far as adding MCPs for documentation. And then I'm like, why I'm going to a separate website to get documentation. You are a coding agent. The code is in your machine. And usually with the code, you have documentation. Why don't you use a documentation that is there already on your machine with the exact version that you're using? Because sometimes you go to the remote server and then we get the documentation for Phoenix 1.8, but we are still on 1.7, right? So for me, like the answer for, oh, they are like too many, like the context thing is like, I'm going to have just a small amount of tools. And what those tools are going to do is that they can run code, right? And I'll let them do whatever they need to do. So trying to keep the amount of tools minimal and powerful. And like this, this take that, you know, like, oh, MCPs are too much. You probably just need code, right? That's kind of, I'm not the first one to say it, but I also think like MCPs, the user experience, like the developer experience around MCPs for coding agents is really poor. Like, I mean, to be fair, it's new. It's still evolving, right? It's probably six months old at this point, but like, so we have issue where, so one of the MCP tool that we're using, it works. It was working for GPT-5, but not for Gemini. And then we fixed it for Gemini and it broke for GPT-5. Like it, if the server disconnects, they can not reconnect again. Like there are all those sort of like annoying issues there. And then do you know about the Figma dev mode thing? So like, so there is an MCP Figma dev mode. So you can run like Figma on your machine. Like there's a desktop client. And then I can go to Figma, inspect an element, click like a component that I wanted to implement. Right? And then, you know what the workflow is today? Is like, I have to go to Figma, click on the component, and then I have to go to the agent and say, I have selected a component. Please implement it. They're like, why?
I already clicked. When it's done, you have to redo it yourself.
Yeah.
That's my experience. Oh, good job. Not good. Not good. It doesn't look like it's supposed to look like.
I already clicked the thing. Why do I have to go back and tell you that I really, like, you know, and then they're like-
Because they're separate tools, right? They're distinctly different tools.
You're meant to have a protocol for those things to communicate.
I know you know the answer. I'm just saying it out loud. Whereas it was, Tidal Wave, it's all integrated. It's all-
It's all integrated. It's all integrated. And I actually, I want to, I want to hold as much as possible with actually adding MCP support to Tidal Wave. Because I think we will have better integrations if we do it by hand. So for example, when I do Figma for Tidal Wave, when you click on the Figma thing, we will know and we'll just tell it, oh, you want to implement this? Oh yeah, just click the button, right? Like don't have to type in the thing.
Just click it. Now, is it going to do it right? That's up to your model, right?
Yeah, maybe we can give more-
And Tidal Wave is just using whatever model you bring to it, basically?
Yes, what we can do is that we can help it. Like you'll be able to click something on Figma and then click something on Tidal Wave. And then we'll be able to say, oh, you should implement this. And this is exactly where it is. So we can improve the experience there, right? But if the, when we send all the information to the agent, if that's going to be ultimately better and the agent will be able to validate that some things look good, like it did with the video or the scrolling. So we are giving it more tools to verify that it did a better job than it just working blind, but ultimately, yeah. Right, and I think like, and I think that's going to be true, like for a lot of things. So there is a tool called Conductor that, for example, they added a GitHub integration. And one of the things they do is that they know which GitHub, they know which Git branch you're using. So in their GitHub integration, they know the comments dropping a PR for that branch, and they automatically surface that in the UI. So you can ask the agent to solve a comment as somebody's commenting on GitHub. So like those sort of experiences doing for MCP, it's just, oh, you know, it's like, oh, get all the comments for me, right? It's like, so I really think that for coding agents, I don't want to generalize this too much, but I think like for coding agents, for a lot of the things you can build, like MCPs doesn't allow you to push information, right? So that's what I'm complaining about. Like GitHub should be able to push information for the MCP, like, oh, there's this comment. Oh, I clicked this on Figma. It doesn't support that. And we're not talking about even the security issues. So I feel like, yeah, like I want to give you like a good, a good package with everything. At the point you're telling users to go like, oh, just go and install those different MCPs. You kind of like gave up on the developer experience, right? Because it's not there yet.
I would tend to agree, I think with that.
How hot was that? Was way too much teasing for not too spicy or?
It's a lot of spice in there. A lot of spice.
It's a variety of spices.
There's a lot of, there was some hedging around, you know, I just feel like you could have dropped it a little hotter.
And then I could, yeah.
Yeah. Yeah, we could have gone ghost, you know, gone ghost. And also I just tend to agree. I think MCP servers seems to be like a builder, like builder driven technology right now versus user driven. Like, I feel like it's, it was like so quickly adopted by all the builders and as users were kind of like, were we asking for this necessarily? And can you, could you do it so that it was made, I want to say more transparent perhaps, or like maybe just user-friendly for us as end users. But man, I've never seen an API or specification, a protocol get built out across the entire tech industry so fast. And we're talking like less than a year. Well, I mean, from their first announcement back in November, I believe, was when MCP was announced by the Anthropic team, like less than a year ago and nobody paid much attention to it for three months. And then all of a sudden in the spring-ish, it's like everybody just started building MCP servers, like everybody.
That's true, yeah.
And then as end users were kind of like, did we ask for this or I don't know, I'm not sure what, it's like you want to be first, I don't know, you don't want to be left out. I'm not sure why everybody just thought immediately, we got to do this, but it was pretty interesting to behold that.
You know, I don't know, I don't have a lot of contexts around MCP servers, but, so I don't really use any of them,
but when I think of them is more like a CLI tool that's on the system already, rather than like pollute my context window with a tool that's an MCP server, why not just have a tool on the system that you can use, not have to be an MCP server.
Like instead of using the GitHub MCP server, you might use the GitHub CLI to access data from GitHub. Is that what you're saying?
Right, like if I already have GH installed and it's already authenticated, why not just use the tool versus some sort of MCP server that just is like in my context, like why does it have to be in my session and configured? It's also instrumentation of tooling, it's a lot of ceremony, you know, it's a lot.
But even you say like, look, well, what if, what if you don't have the GitHub command line tool? Yeah, have it, right? Like it can write Elixir, it can write JavaScript, it can write Python to talk to the API. Then of course, so like, I think like the authentication part of the MCP is interesting because if it had to write like a tool, it would have to ask for your credentials somehow. So that's part is good, but it feels like that's, in some, in certain ways, that's probably all we needed, a way for the agents to ask, which we have OAuth and other things, and a way for the agents to ask for your permission, to use, to talk to some API on your behalf, right? Because if it has the code, it could also do things like, oh, it can ask information, it can get the raw data from GitHub, then use whatever library to compute the information that you want, right? And give you a better result than trying to do with the MCP, getting plain text, and then maybe doing something interesting with that, right? Like we have those things that are really good at coding, and we are sometimes like dumbing things down to a text interface while they could write code. I like to think, one of the questions that I asked myself, like people are talking a lot about like personal devices, right, they're going to have our personal devices that are AI augmented and this kind of things. And I say like, that thing needs to know how to run code, right, it's like, because how can you have like some like generic like personal assistant can do everything and that thing cannot run code?
And he says, and I'm lying, gotta be able to run code. That's right, you better run code. I'm telling you, get out of here. It's like the first thing on the resume, can this person run code? Well, yeah, I tend to agree. I think MCP is an interesting phenomenon, and most widely adopted by builders technology in that I can think of in history. So there it is, it's there now, but not necessarily, it didn't necessarily have to be there. And there you have it, spicy, spicy Jose. What else? I mean, TIDEWAVE, you're trying to make a business out of this thing, you're trying to make a living, what are you trying to do?
Yes, so it is a paid product. We consider a little bit, but then realize, well, everybody, this is an AI thing. It's going, it's a very rapidly changing landscape. So if you want to be able to keep up and to invested on these and continue improving it, and also support different kinds of frameworks, we need to find a sustainable way of doing that. So yeah, we'll see where the launch was pretty good. We got a lot of people excited, but it also pointed out like, so today's like bring your own API key. And the feature that people ask the most for is cloud code support. So being able to bring like codecs and cloud code. And yeah, let's see. So right now I like to, I think in the mail, I sent you folks like my product history, my history of building products has been like cataloged by the changelog.
Yeah, pretty much. We're doing our job there. Like the changelog of Jose's products, you know?
Yes, yes. It really is. Yeah, so there's Livebook, which is also running, right? And now, Tidewave.
And what about Elixir, man? Is it done or are you done with it or still working on it?
Still working on it. Still, it changes. So around now that there are a good amount of Tidewave things happening, it's fresh. I think we are about five weeks since we launched. So we just launched it. And you know, when you launch something like that, there's a lot of work, feedback and prioritizing. I think it's kind of like about half half of my time on Elixir, half on Tidewave. But otherwise, like most of my works is still going to the Elixir type system and Elixir work. The other thing that I want to do, one of the things that going outside of, Tidewave is an example, but going outside of Tidewave, one of the things that makes me excited about AI is because we can look at the tools and find ways to improve and build new developer tools. And I've been like exploring some ideas around those areas. Well, so I was saying like tests, the tests that the calling agents write, I usually don't like them. They are redundant. And I think a lot of people don't pay attention to, or they use too many mocks. A lot of people don't pay attention to code quality in tests, like test is test, right? So I'm trying to figure out ways of improving that. So for example, when the agents writing tests, can we measure coverage and guide the agent to write tests based on coverage, but also give information about, oh, those tests, they are redundant. They are pretty much checking the same lines of code. You can try and define them. And the cool thing is that we are thinking about those things because we want to automate the agent, but a lot of it translates to better developer tools, right? We released this for the agent, but developers can also use it. So I think a lot of the work that we are doing right now will go back, will feed back into like better tools. Even when I'm working on TIDEWAVE, a good amount of the work will eventually feed back into better tools for like Elixir and the community too.
Right on, man. Well, keep fighting the good fight. Always love talking to you. I always love hearing what you are working on. I am gonna give TIDEWAVE a ride in earnest. I have a Rails app now. You know why we have an Elixir or Phoenix app? So I can use it in both contexts and let you know what I think. Give you some feedback.
Let me know. And then right now it's either bring your own key or you can use your GitHub co-pilot integration. And then hopefully in about a month. What are the tools that you use today?
So I use, I use Claude code. I have ChatGBT Pro, but I don't actually use Codex. I'm not sure if I get Codex with Pro. I have Gemini CLI. Okay. I don't know. It's very confusing. Like when you get Claude code, do you also get tokens for the API? I don't think so. Right? So you buy those separately. I'd rather not buy more of those. I'd rather, is this why people want to bring their Claude code subscription? Cause they get. Exactly. Yeah, I would love to do that I have another toll bridge toll road as Adam calls them.
Toll booth.
Toll booths. So that's my current setup. Adam, what are you using? You got some amp subscription maybe?
About five bucks left in the amp. It's, I still can't hold it right. It's always expensive for me.
It's really great though.
It's so cool how it works. It's really one of the best,
but I haven't found a way to hold it in a way that isn't expensive. And so Claude code primarily same. I have an anthropic key, but only because I think one thing had to have it. And I think I got like the trial balance they give you. I'm still on that. So there's something past that. But Claude code, I might code I like as well. They're cool. And I like, I still like amp. It's just, I haven't found a way to make it that expensive for me. And I just, I don't know, but it is really, really, really good when it does.
So is there anything in particular that you like about it?
It seems to be just, and it's got this Oracle. So speaking to amp code, it's got an Oracle where it can go back and consult. It's kind of like ultra think now that I think about it, Jared, it's not quite that, but it's a bit more where it'll go into a deeper understanding of coding patterns and like a learned behavior across, let's just say like a rust CLI ecosystem. Like how do those work generally? What are good patterns? And it will come back and tell you stuff like that. So I find that it's research and its ability to execute in a hands-free YOLO environment is just, it's really good at that. Like you wind it up on the right thing with the right research, the right context, the right everything. It just plows through it for hours and just does amazing work. So, but it gets expensive if you don't like work with it and babysit it.
Do you prompt for the Oracle or it automatically figures out that it's like the plan mode or it automatically figures out, oh, now's the time for like some Oracle.
Yeah, you know, I think it does it on its own desire, but you can also say, hey, in this exercise, go ahead and prompt the Oracle as well. Tap them, get them involved. I don't know, it feels cool. It does it and good results come, I guess, but you can either prompt it yourself or it just kind of does it when it needs to. I am not an AMP code expert by any means, but that's how I experienced it.
Yeah, so wrapping it up, yeah, right now is bring your own API key and you cannot, yeah, and cloud code, like your open AI subscription or cloud subscription does not give an API token. And we cannot actually, because, you know, cloud code is just using a cloud API, but we cannot use that API. It's actually not according, it's not legal according to cloud, to anthropic terms. So we decided to not do that. That's why we're working on the whole cloud code, codex integration kind of things. And so either bring your own key, but I really, at the end here, I really would recommend giving the GitHub co-pilot a try because it's confusing because Microsoft calls everything co-pilot, right? But there is a GitHub co-pilot plan that gives you access to kind of like a bunch of different models. And it's actually like, it's a predictable plan in the sense that, because the thing with paying for tokens is that it's very hard for you to predict how much it's going to be. And paying and GitHub, the GitHub co-pilot subscription is per messages, which at least improves the visibility a little bit. And it has like a basic plan, quite affordable. So that's a good way to try it out for now to get some feedback. And yeah, we are hopefully launching a cloud code. We are, Z release something called ACP. I don't know if you saw those news, like with the Asian client protocol. So you can talk to codex, cloud code, Gemini CLI, right? So we are building on top of that, but it's a bunch, it's work because you're running on the browser, right? And ACP is an IO protocol. So you can figure out all the hopes that we have to jump to make those things talk to each other. But yeah, hopefully we'll be launching that soon alongside Django, Next.js and so on.
I wish Anthropic would just give you, when you get some sort of subscription, they'll just give you a token or a key that you can use against that subscription at the same pace that you use cloud code, right? Like I guess they're just subsidizing that to death and then don't want to subsidize their API because I can pay 20 bucks a month or whatever it is and use the dog do out of cloud code, but I got to pay 200 bucks or 500 bucks equivalent to use this API the same amount. I just made those numbers up, but you can see like the discrepancy is there. It doesn't make sense to me. I guess they just want you to use their CLI a lot.
It's not even, I would say it's not even about how much the cost, it's just how predictable it is, right?
Yeah, because you don't want to get dinged for making the bad prompt, you know? Like you want to set, I'm fine with 20 bucks, 40 bucks, 50 bucks a month, but just because I use it a lot, don't give me a, I tell it to Ultrathink and it's like, well, that's $17 for that Ultrathink. And it was still wrong. Can I give my money back to other returns?
I know, right? Service degradation is a real problem for me, you know? But here's the thing. I actually think pushing people towards using cloud code more or codex and building on top of those tools, like with ACP, is not actually a bad idea. Okay, why is that? Because here's, okay, let me tell you a story, quick story, I know you're going. So when we first implemented a tight wave, we focused on Entropic and the cloud models. And like, if you go to cloud code prompt, it has things like, you should be concise, you know, don't use too many words, use four words. I think it even said at some point, one answer word is best. And of course, like it doesn't listen to that, right? It's like, it finishes the feature, just like dumps, like four pages of text about the thing that it implemented that nobody ever reads, right? And so when we did our prompt, we tested with those things as well. It does improve a little bit, right? But it also say things like, don't write a code comment. It always writes a code comment. So anyway, so we wrote the prompt and then when GPT-5 came out, we decided to give it a try and start support OpenAI. And it was very curious because it would say like, hey, implement this feature. It will do all those things and then at the end, done. And then you would ask something and it would say, good. And then we realized that the prompt we had for Entropic that was saying be concise, GPT-5 was actually listening to that prompt and it was being concise. That's why I was just saying done, good. It was not doing any fluff or anything. And that's when you realize that you actually have to come with, if you're building a coding agent like I am, you have to actually build a prompt per model, right? And now GPT-5 codecs came up with its own prompt that it's different from the GPT-5. And then not only, so just doing that, like fine tuning the prompt per model, that's a pain. That's like, that's boring work. That's not something I want to do, right? And then it gets the other thing, which is then they have the tools. And at this point, those coding models, they are becoming so important for those companies that they're actually fine tuning, like how it should send edits to a file. You know, like they are fine tuning the models for that. So when the GPT-5 codecs model came out, they also said like, look, this model is best at sending these kinds of diffs and edits over the wire. So now I have to implement the specific editing tools per model that I support. And then each of those models come with like their own context engineering techniques. So at that point, like it's, you know, like if you're like me or building a coding agent, you want to be able to get that infrastructure and build on top, right? And then it comes with the nice thing. So like going back to the hot take, if you're building like your agentic tooling for coding, instead of doing the MCP, don't do the MCP, build on top of ACP and have control of the agent and use all those things and extend that instead. And with the announcement, with Cloud Sonnet 4.5 today, yesterday, they actually recognized that. They renamed like the Cloud Code SDK to Cloud Agents SDK. They moved a couple of things around for it to be a better SDK for people to build on top, right? Because I think that there is a lot to gain for like leveraging everything. Like they are tightening, right? The model of those tools. And we want to be able to leverage that.
That makes sense. They're putting a lot of work in to take that model and make it an agent. And there's no reason why everybody else needs to do that work as well.
What could you, how would you go about building on that right now? Like where would you go? What's the starting point to building that right now?
I don't know, what's the website?
Oh, I don't know.
Zen.dev slash ACP or something?
No, I would just search for agent client protocol on Google and see where that-
Agent client protocol.com.
All right. All together, right? No.
All spell it out, yep.
All right, yeah.
Of course, when you Google that,
you'll be able to go to your first hit. And I believe there is an SDK TypeScript. I don't know other languages right now, but the protocol is-
That's the only language that you know? You heard it now, there's a hot take. You heard it here first. Jose only knows TypeScript. That's when you know it's getting late. I just auto-completed you.
So yeah, but I can see it becoming like more and more important. And we are going to see like more SDKs, but the protocol is also relatively straightforward as well. So yeah, I'm hoping that I can really see a lot of value in there and I hope it's going to catch up to the point where it's because a lot of the, I think Gemini CLI supported built-in in the CLI, but when Z released support for Cloud Code, for example, it's because they have a wrapper. So I hope it grows to the point where more, like the CLIs, they are coming with built-in support for it, right? And then I hope it grows to the point that, like, which of the big providers have their CLI version right now? It's Gemini, it's OpenAI, and it's Entropic, right? Like Grox, they don't have theirs, I think. ZAI, they don't have theirs. So I actually hope like those other companies, they start providing those CLIs as well with all those things we have been talking about in the sense like, look, here are the optimized diffs. Those are the things we improved for. So we can move to the point where we are all building on top and not particularly inventing that wheel. So I really hope it grows.
More CLIs, give them to me. There you go. Thanks for hanging with us, Jose. It's always a pleasure, man.
My pleasure, yeah.
All right, bye friends. Ooh, synchronized. All right, that is your changelog for this week. We hope you enjoyed Monday's news episode about exiting the feed, Vercel versus Cloudflare, and why over-engineering happens. We hope you enjoyed Wednesday's interview with Evan Yu, and we hope you enjoyed this episode with Jose, because after all, we're here for your enjoyment. We also want you to learn and to keep up the easy way, and we want you to level up your own work and to feel connected to this worldwide community of hackers but we love for you to do all those things while enjoying the process. Because after all, the process, that's our life, isn't it? If you do enjoy our work, please tell a friend or three, or send us an email, editors at changelog.com. We absolutely love hearing from you all. Thanks once again to our partners at Fly.io and to our sponsors of this episode, depo.dev, fabi.ai, and miro.com. Thanks also to the one, the only, the mysterious Breakmaster Cylinder. Have yourself a great weekend, let someone else praise you and not your own mouth, and let's talk again real soon.