I had a similar anecdotal experience a few weeks ago.
I was working on a blog entry in a VS Code window and I hadn't yet saved it to disk. Then I accidentally hit the close-window keyboard shortcut... and it was gone. The "open last closed window" feature didn't recover it.
On a hunch, I ran some rg searches in my VS Code Library feature on fragments of text I could remember from what I had written... and it turned out there was a VS Code Copilot log file with a bunch of JSON in it that recorded a recent transaction with their backend - and contained the text I had lost.
At my last job, whenever a commit wouldn't build, we would blast it into a slack channel with an alert that included the comment and the name of the developer.
Ah yes. Public shaming. The “beatings will continue until morale improves” strategy of code development. Next time, you may want to suggest an evergreen strategy where commits are tested before they’re merged.
"There isn’t enough bandwidth to transmit video to all devices from the servers and there won’t always be good connectivity, so there still needs to be significant client-side AI compute."
So no real operating system, except an AI which operates the whole computer including all inputs and outputs? I feel like there's a word for that.
Well, if the AI controls the computer and is how the user interacts with it, I was going to use "Operating System" myself. But that's two words, my bad.
I would expect this from a 3rd grader for sure, my friend's sister's book had a photo of ubuntu and windows and operating systems and it won't take me more than 5 minutes to a literal toddler who knows about operating systems from a very high level to explain to them after they know about operating system to teach them why this thing that elon said was the dumbest thing anyone ever said, period.
Is this what free market capitalism does when people at the top are completely incapable of forming basic thoughts that can make sense and not be a logical paradox?
I am thinking more and more after I read your post that maybe these guys won't really miss out on a few billion $ of taxes and other things, nothing can fix the holes in their lives.
Now it makes sense why we have a system which has failed. Why capitalism feels so heartless. So ruthless. Its run by people who don't have a heart or well in this case a brain as well
https://gravitypayments.com/the-gravity-70k-min/ This comes to my mind more and more, why can't companies pay their workers not what they think they can get away with but rather what they think is fair. Why can companies be morally bad for their workers and that's okay and why are people like these running such companies who can't equate 1+1 in the name of AI.
Sounds like the Sun Ray thin client, built by Sun Microsystems in 1999. This was similar to the earlier graphical X terminals, which were reminiscent of mainframe terminals in the 1960s. It's the "wheel of reincarnation".
Super cool!
What I am wondering is if there is any interest in lets say having a smartphone that has this tech(see my other comment wishing for a open source phone somewhere on hackernews or the internet really)
So lets say we can just have a really lightweight customizable smartphone which just connects over wifi or wire to something like raspberry pi or any really lightweight/small servers which you can carry around and installing waydroid on it could make a really pleasant device and everything could be completely open source and you can make things modular if you want...
Like, maybe some features like music/basic terminal and some other things can be seen from the device too via linux x and anything else like running android apps calls up the server which you can carry around in a backpack with a powerbank
If I really wanted to make it extremely ideal, the device can have a way of plugging another power bank and then removing the first powerbank while still running the system so that it doesn't shut down and you literally got a genuinely fascinating system that is infinitely modular.
Isn't this sort of what stadia was? But just targeted more for gaming side since Games requires gpu's which are kinda expensive...
What are your thoughts? I know its nothing much but I just want a phone which works for 90% of tasks which lets be honest could be done through a really tiny linux or sun ray as well and well if you need something like an android app running, be prepared for a raspberry pi with a cheap battery running in your pocket. Definitely better than creating systems of mass surveillance but my only nitpick of my own idea would be that it may be hard to secure the communication aspect of it if you use something like wifi but I am pretty sure that we can find the perfect communication method too and it shouldn't be thaaat big of a deal with some modifications right?
I usually stay off X. The number of sycophants in that thread is alarming.
I don't want "apps on demand" that change when the AI training gets updated, and now the AI infers differently than yesterday - I want an app the bank vetted and verified.
Commit even as a WIP before cleaning up! I don't really like polluting the commit history like that but with some interactive rebase it can be as if the WIP version never existed.
(Side ask to people using Jujutsu: isn't it a use case where jujutsu shines?)
Assuming you squash when you merge the PR (and if you don't, why not?), why even care? Do people actually look through the commit history to review a PR? When I review I'm just looking at the finished product.
I don't typically review commit by commit, but I do really appreciate having good commit messages when I look at the blame later on while reading code.
Which is just the PR message if you squash. To be clear, I'm not advocating for bad messages, but I am saying I don't worry about each commit and focus instead on the quality and presentation of the PR.
Indiscriminate squashing sucks. Atomic commits are great if you want the git history to actually represent a logical changelog for a project, as opposed to a pointless literal keylog of what changes each developer made and when. It will help you if you need to bisect a regression later. It sucks if you bisect and find the change happened in some enormous incohesive commit. Squashing should be done carefully to reform WIP and fix type commits into proper commits that are ready for sharing.
> It sucks if you bisect and find the change happened in some enormous incohesive commit.
But why are any PRs like this? Each PR should represent an atomic action against the codebase - implementing feature 1234, fixing bug 4567. The project's changelog should only be updated at the end of each PR. The fact that I went down the wrong path three times doesn't need to be documented.
That's true, some are big and messy, or the change has to be created across a couple of PRs, but I don't think that the answer to "some PRs are messy" is "let's include all the mess". I don't think the job is made easier by having to dig through a half dozen messy commits to find where the bug is as opposed to one or two large ones.
> I don't think that the answer to "some PRs are messy" is "let's include all the mess"
Hey look at us, two alike thinking people! I never said "let's include all the mess".
Looking at the other extreme someone in this thread said they didn't want other people to see the 3 attempts it took to get it right. Sure if it's just a mess (or, since this is 2025, ai slop) squash it away. But in some situations you want to keep a history of the failed attemps. Maybe one of them was actually the better solution but you were just short of making it work, or maybe someone in the future will be able to see that method X didn't work and won't have to find out himself.
I can see the intent, but how often do people look through commit history to learn anything beside "when did this break and why"? If you want lessons learned put it in a wiki or a special branch.
Main should be a clear, concise log of changes. It's already hard enough to parse code and it's made even harder by then parsing versions throughout the code's history, we should try to minimize the cognitive load required to track the number of times something is added and then immediately removed because there's going to be enough of that already in the finished merges.
> If you want lessons learned put it in a wiki or a special branch.
You already have the information in a commit. Moving that to another database like a wiki or markdown file is work and it is lossy. If you create branches to archive history you end up with branches that stick around indefinitely which I think most would feel is worse.
> Main should be a clear, concise log of changes.
No, that's what a changelog is for.
You can already view a range of commits as one diff in git. You don't need to squash them in the history to do that.
I am beginning to think that the people who advocate for squashing everything have `git commit` bound to ctrl+s and smash that every couple minutes with an auto-generated commit message. The characterization that commits are necessarily messy and need to be squashed as to "minimize the cognitive load" is just not my experience.
Nobody who advocates for squashing even talks about how they reason about squashing the commit messages. Like it doesn't come into their calculation. Why is that? My guess is, they don't write commit messages. And that's a big reason why they think that commits have high "cognitive load".
Some of my commit messages are longer than the code diffs. Other times, the code diffs are substantial and there are is a paragraph or three explaining it in the commit message.
Having to squash commits with paragraphs of commit messages always loses resolution and specificity. It removes context and creates more work for me to try to figure out how to squash it in a way where the messages can be understood with the context removed by the squash. I don't know why you would do that to yourself?
If you have a totally different workflow where your commits are not deliberate, then maybe squashing every merge as a matter of policy makes sense there. But don't advocate that as a general rule for everyone.
Commits aren't necessarily messy, but they're also not supposed to be necessarily clean. There's clearly two different work flows here.
It seems some people treat every commit like it's its own little tidy PR, when others do not. For me, a commit is a place to save my WIP when I'm context switching, or to create a save point when I know my code works so that I can revert back to that if something goes awry during refactoring, it's a step on the way to completing my task. The PR is the final product to be reviewed, it's where you get the explanation. The commits are imperfect steps along the way.
For others, every commit is the equivalent of a PR. To me that doesn't make a lot of sense - now the PR isn't an (ideal world) atomic update leading to a single goal, it's a digest of changes, some of which require paragraphs of explanation to understand the reasoning behind. What happens if you realize that your last commit was the incorrect approach? Are you constantly rebasing? Is that the reader's problem? Sure, that happens with PRs as well, but again, that's the difference in process - raising a PR requires a much higher standard of completion than a commit.
it is not a philosophical debate against the auto-squash. It is just auto-squash deletes potentially useful data automatically and provides zero benefit? Or what is the benefit?
1. pr message doesn't contain _all_ intent behind each change while the commit did cover the intent behind a technical decision. would you put everything in the pr message? why? it will just be a misleading messy comment for unrelated architectural components.
2. you branch from a wip branch - why? just. - now when the original branch is merged you can't rebase as github/lab messed up the parents.
3. interactive rebase before merge works fine for the wip wip wip style coding. But honestly wip wip wip style happens on bad days, not on days where you are executing a plan.
4. most importantly: you can read the pr message stuff if you filter on merge commits, but if you auto-squash you lose information.
The workflow you're describing here is fine for staging or stashing or commits that you don't intend to publish. I'll sometimes commit something unfinished, then make changes and either stash those if it doesn't work out, or commit with --amend (and sometimes --patch) to make a single commit that is clean and coherent. The commits aren't always small, but they're meaningful and it's easier to do this along the way than at the very end when it's not so clear or easy to remember all the details from commits that you made days ago.
> It seems some people treat every commit like it's its own little tidy PR
Pull requests are not always "little". But I'm nitpicking.
It sounds like a big difference between the workflows is that you don't amend or stash anything locally along the way. I guess I find it easier and more valuable to edit and squash changes locally into commits before publishing them. Instead of at the very end as a single pull request. For me, I can easily document my process with good commit messages when everything is fresh in my mind.
The end result is that some commits can be pretty big. Sometimes there is no good opportunity to break them down along the way. That is part of the job. But the end result is that these problems concerning a messy history shouldn't be there. The commits I write should be written with the intent of being meaningful for those who might read it. And it's far easier to do that along the way then at the end when it's being merged. So it's difficult to understand some of the complaints people make when they say it's confusing or messy or whatever.
Even if requirements change along the way while the pull request is open, and more commits are added as a response. I've just never had a situation come up where I'm blaming code or something and looking back at the commit history and struggling to understand what was going on in a way where it would be more clear if it had been squashed. Because the commits are already clean and, again, it's just easier to do this along the way.
But again, I use `git commit --verbose --amend --patch` liberally to publish commits intentionally. And that's why it feels like a bit of busywork and violence when people advocate for squashing those.
You say "two different work flows here" and I think perhaps a better way of considering this is as having multiple _kinds_ of history.
Most of the time, I don't have a clean path through a piece of work such that I can split it out into beautifully concise commits with perfect commit messages. I have WIP commits, messy changes, bad variable names, mistakes, corrections, real corrections, things that I expect everyone does. I commit them every one of them. This is my "private history" or my scratch work.
After I've gotten to the end and I'm satisfied that I've proven my change does what its supposed to do (i.e. tests demonstrate the code), I can now think about how I would communicate that change to someone else.
When I in this stage, it sometimes leads to updating of names now that I'm focussing on communicating my intention. But I figure out how to explain the end result in broad strokes, and subdivide where necessary.
From there, I build "public history" (leveraging all the git tricks). This yields pieces that are digestible and briefly illuminated with a commit message. Some pieces are easy to review at a glance; some take some focus; some are large; some are small.
But key is that the public history is digestible. You can have large, simple changes (e.g. re-namings, package changes) that, pulled out as a separate commit, can be reviewed by inspection. You can have small changes that take focus to understand, but are easily called out for careful attention in a single commit (and divorced from other chaff).
By having these two sorts of history, I can develop _fearlessly_. I don't care what the history looks like as I'm doing it, because I have the power (through my use of git) to construct a beautiful exposition of development that is coherent.
A PR being messy and cluttered is a choice. History can be _easily_ clarified. Anyone who uses git effectively should be able to take a moment to present their work more like framed artwork, and not a finger-painted mess stuck to the refrigerator.
I can see your point and sometimes I myself include PoC code as commented out block that I clean up in a next PR incase it proves to be useful.
But the fact is your complete PR commit history gives most people a headache unless it's multiple important fixes in one PR for conveniency's sake. Happens at least for me very rarely. Important things should be documented in say a separate markdown file.
This simply isn’t true unless you have to put everything in one commit?
To be honest, I usually get this with people who have never realized that you can merge dead code (code that is never called). You can basically merge an entire feature this way, with the last PR “turning it on” or adding a feature flag — optionally removing the old code at this point as well.
So maintaining old and new code for X amounts of time? That sounds acceptable in some limited cases, and terrible in many others. If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.
My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.
> So maintaining old and new code for X amounts of time?
No more than normal? Generally speaking, the author working on the feature is the only one who’s working on the new code, right? The whole team can see it, but generally isn’t using it.
> If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.
If you have people good at what they do ... maybe. I’ve seen this end very badly due to merge artefacts, so I wouldn’t recommend doing any merges, but rebasing instead. In any case, you can always copy a function to another function: do_something_v2(). Then after you remove the v1, remove the v2 prefix. It isn’t rocket science.
> My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.
I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes. The only thing I can think of is your own company’s policies in relation to those regulations; in which case, you can change your own policies.
Medical industry, code that gets shipped has to be documented, even if it's not used. It doesn't mean we can't ship unused code, it just means it's generally a pretty bad idea to do it. Maybe the feature's requirement might change during implementation, or you wanted to do a minor version release but that dead code is for a feature that needs to go into a major version (because of regulations).
> I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes
Our regulatory compliance regime hates it when we run non-main branches in production and specifically requires us to use feature flagging in order to delay rollouts of new code paths to higher-risk markets. YMMV.
yes, that would be ideal. especially in a world with infrastructure tied so closely to the application this standard cannot always be met for many teams.
I so miss bazaar's UI around merges/commits/branches. I feel like most of the push for squashing is a result of people trying to work around git's poor UI here.
Alternative to squashing is not a beautiful atomic commits. It is series of commits where commit #5 fixes commit #2 and intruduces bug to be fixed on commit #7. Where commit #3 introduces new class that is going to be removed in commits #6 and #7.
Yeah, I don't see the value in looking through that. At best I'll solve the problem, commit because the code works now, create unit tests, commit them, and then refactor one or both in another commit. That first commit is just ugly and that second holds no additional information that the end product won't have.
I feel like that requires a lot of coordination that I, in the midst of development, don't necessarily have. Taking my WIP and trying to build a story around it at each step requires a lot of additional effort, but I can see how that would be useful for the reviewer.
We can agree that we don't need those additional steps once the PR is merged, though, right?
I have literally never met a developer who does this (including myself). 99% of all PRs I have ever created or reviewed consist of a single commit that "does the thing" and N commits that fix issues with/debug failure modes of the initial commit.
You never design a solution which needs multiple architectural components which _support_ the feature? I do, and would make little sense to merge them as separate PRs most of the time as that would mean sometimes tests written on the inappropriate level, also a lot more coordination and needs a lot more elaborate description then just explain how the set of components work in tandem to provide the user value.
Basically, jj will give you a checkpoint every time you run a jj command, or if you set up file watching, every time a file changes. This means you could recover this sort of thing, assuming you'd either run a commend in the meantime or had turned that on.
Beyond that, it is true in my experience that jj makes it super easy to commit early, commit often, and clean things up afterwards, so even though I was a fan of doing that in git, I do it even more with jj.
I assume Jujutsu only commits the file when you use one of the jj commands. I don't think it keeps a daemon running and checking for changes in the files.
I have heard of jj. I have tried jj, I love jj but I couldn't get myself towards using it.
This itself seems to me the thing which will make me push towards jj.
So if I am correct, you are telling me that I can have jj where I can then write anything in the project and it can sort of automatically record it to jj and afterwards by just learning some more about jj, I can then use that history to create a sane method for me to create git commits and do other thing without having to worry too much.
Like I like git but it scares me a little bit, having too many git commits would scare me even further but I would love to use jj if it can make things less scary
Like what would be the command / exact workflow which I am asking in jj and just any details since I am so curious about it. I have also suffered so much of accidentally deleting files or looking through chat logs if I was copy pasting from chatgpt for some one off scripts and wishing for a history of my file but not wanting git everytime since it would be more friction than not of sorts...
> I can then use that history to create a sane method for me to create git commits and do other thing without having to worry too much.
It's easier than that. Your jj commits are the commits that will be pushed - not all the individual git commits.
Conceptually, think of two types of commits: jj and git. When you do `jj new`, you are creating a jj commit.[1] While working on this, every time you run a command like `jj status`, it will make a git commit, without changing the jj commit. When you're done with the feature and type `jj new` again, you now have two jj commits, and many, many git commits.[2] When you do a `jj git push`, it will send the jj commits, without all the messiness of the git commits.
Technically, the above is inaccurate. It's all git commits anyway. However, jj lets you distinguish between the two types of commits: I call them coarse and fine grained commits. Or you can think hierarchically: Each jj commit has its own git repository to track the changes while you worked on the feature.[2]
So no, you don't need to intentionally use that history to create git commits. jj should handle it all for you.
I think you should go back to it and play some more :-)
I do a bunch of context switching, and I commit every time I switch as stashing would be miserable. I never expect those WIP commits to reviewed and it'd be madness to try.
same - my eod commits are always titled 'checkpoint commit: <whatever>' and push to remote. Then before the MR is made (or turned from draft to final) I squash the checkpoint commits - gives me a psychological feeling of safety more than anything else imo
Git is a distributed version control system. You can do whatever you like locally and it won't "pollute" anything. Just don't merge that stuff into shared branches.
I automatically commit every time my editor (emacs) saves a file and I've been doing this for years (magit-wip). Nobody should be afraid of doing this!
Honest question - What DO you merge into shared branches? And, when your local needs to "catch up", don't you have to pull in those shared commits which conflict with your magit-wip commits because they touch the same code, but are different commit hashes?
Feature branches that have been cleaned up and peer-reviewed/CI-tested, at least in the last few places I worked.
Every so often this still means that devs working on a feature will need to rebase back on the latest version of the shared branch, but if your code is reasonably modular and your project management doesn't have people overlapping too much this shouldn't be terribly painful.
I think maybe one other lesson, although I certainly agree with yours, and with the other commenters who talk about the unreliability of this particular method? This feels like an argument for using an editor that autosaves history. "Disk is cheap," as they say -- so what if your undo buffer for any given file goes back seven days, or a month? With a good interface to browse through the history?
My editor syncs the total editing state to disk every few minutes, across all open instances.
I eventually added support for killing buffers, but I rarely do (only if there's stuff I need to purge for e.g. liability reasons). After a few years use, I now have 5726 buffers open (just checked).
I guess I should garbage collect this at some point and/or at least migrate it from the structure that gets loaded on every startup (it's client-server, so this only happens on reboot, pretty much), but my RAM has grown many times faster than my open buffers.
Most of Google's internal development happens on a filesystem that saves absolutely everything in perpetuity. If you save it, the snapshot lives on forever--deleting stuff requires filing a ticket with justification. It is amazingly useful and has saved me many times. Version control is separate, but built on it.
When I eventually move on, I will likely find or implement something similar. It is just so useful.
I used to rely on this on the old DEC systems, when editing and saving foo.dat;3 gave you foo.dat;4. It didn't save everything forever - and you could PURGE older versions - but it saved enough to get me out of trouble many times.
I like to mess around. Some of my best work comes out of messing around. The trick is making sure you mess around in a way that lets you easily hold onto whatever improvements you make. For me that means committing obsessively.
A fun anecdote, and I assume it's tongue in cheek, although you never know these days, but is the LLM guaranteed to give you back an uncorrupted version of the file? A lossy version control system seems to me to be only marginally better than having no VCS at all.
I frequently (basically every conversation) have issues with Claude getting confused about which version of the file it should be building on. Usually what causes it is asking it do something, then manually editing the file to remove or change something myself and giving it back, telling it it should build on top of what I just gave it. It usually takes three or four tries before it will actually use what I just gave it, and from then on it keeps randomly trying to reintroduce what I deleted.
From experience, no. I’ve customized my agent instructions to explicitly forbid operations that involve one-shot rewriting code for exactly this reason. It will typically make subtle changes, some of which have had introduced logic errors or regressions.
When I used toolcalls with uuids in the name, tiny models like quantized qwen3-0.6B would occasionally get some digits in the UUID wrong. Rarely, but often enough to notice even without automation. Larger models are much better, but give them enough text and they also make mistakes transcribing it
Well, it'll give you what the tokenizer generated. This is often close enough for working software, but not exact. I notice it when asking claude for the line number of the code with specific implementation. It'll often be off by a few because of the way it tokenizes white space.
I'd say it's more likely guaranteed to give you back a corrupted version of the file.
I assume OP was lucky because the initial file seems like it was at the very start of the context window, but if it had been at the end it would have returned a completely hallucinated mess.
Indeed. OP, nothing is "in" an LLM's context window at rest. The old version of your file is just cached in whatever file stores your IDE's chat logs, and this is an expensive way of retrieving what's already on your computer.
I mean there is the chance it's on someone else's computer ^W^W^W^ the cloud, and his provider of choice doesn't offer easy access to deep scrollback ... which means this is only inefficient, not inefficient and pointless.
Technically it doesn't have to be since that part of the context window would have been in the KV cache and the inference provider could have thrown away the textual input.
possible - but KV caches are generally _much_ bigger than the source text and can be reproduced from the source text so it wouldn't make a lot of sense to throw it out
Using LLMs as an extra backup buffer sounds really neat.
But I was trained with editing files in telnet over shaky connections, before Vim had auto-backup. You learn to hit save very frequently. After 3 decades I still reflexively hit save when I don’t need to.
I don’t forget to stage/commit in git between my prompts.
The new checkpoint and rollback features seem neat for people who don’t have those already. But they’re standard tools.
This isn't actually a Gemini - a copy of the file was already stored locally by Cursor. Most modern editors, including VS Code, can recover files from local history without needing Git.
The interesting thing here is the common misconception that LLMs maintain internal state between sessions which obviously they don't. They don't have memory and they don't know about your files.
I've had this exact thing happen, but with the LLM deciding to screw up code it previously wrote. I really love how Jujutsu commits every time I run a "jj status" (or even automatically, when anything changes), it makes it really easy to roll back to any point.
Like the author, I've also found myself wanting to recover an accidentally deleted file. Luckily, some git operations, like `git add` and `git stash`, store files in the repo, even if they're not ultimately committed. Eventually, those files will be garbage collected, but they can stick around for some time.
Git doesn't expose tools to easily search for these files, but I was able to recover the file I deleted by using libgit2 to enumerate all the blobs in the repo, search them for a known string, and dump the contents of matching blobs.
If you sent the python file to Gemini, wouldn't it be in your database for the chat? I don't think relying on uncertain context window is even needed here!
A big goal while developing Yggdrasil was for it to act as long term documentation for scenarios like you describe!
As LLM use increases, I imagine each dev generating so much more data than before, our plans, considerations, knowledge have almost been moved partially into the LLM's we use!
I would have pressed Ctrl-Z in my editor like mad until I got the file. If I was using vim I could even grep for it through my history files, thanks to vim-persisted-undo.
>give me the exact original file of ml_ltv_training.py i passed you in the first message
I don't get this kind of thinking. Granted I'm not a specialist in ML. Is the temperature always 0 or something for these code-focused LLMs? How are people so sure the AI didn't flip a single 0 to 1 in the diff?
Even more so when applied to other more critical industries, like medicine. I talked to someone who developed an AI-powered patient report summary or something like that. How can the doctor trust that AI didn't alter or make something up? Even a tiny, single digit mistake can be quite literally fatal.
> I refactored all the sketchy code into a clean Python package, added tests, formatted everything nicely, added type hints, and got it ready for production.
The fact that type hints are the last in the list, not first, suggests the level of experience with the language
It's disabled by default, but even with the default setups, you can find large snippets of code in ~/.gemini/tmp.
tl;dr: Gemini cli saves a lot of data outside the context window that enables rollback.
I'm sure other agents do the same, I only happen to know about Gemini because I've looked at the source code and was thinking of designing my own version of the shadow repo before I realized it already existed.
Over the years, I've heard so many stories like these without happy ending - developers wasting days and sometimes even a week or two of work, because they do like to commit and use git often - that my long-time upheld practice is to pretty much always create feature/develop branches and commit as often as possible, often multiple times per hour.
It has nothing to do with context window . Its cursor stores locally gigabytes of data including your requests and answers. It’s a classical rag , not a “long context”
I find git is just about the only thing you need to lock down when using AI. Don't let it mess with your git, but let it do whatever else it wants. Git is then a simple way to get a summary of what was edited.
I would recommend the Code Supernova model in Cursor if you want a 1M token context window. It's free right now since the model is being tested in stealth, but your data will be used by XAI or whoever it turns out the model creator is.
1M context is amazing, but even after 100k tokens Gemini 2.5 Pro is usually incapable of consistently reproducing 300 LOC file without changing something in process. And it actually take a lot of effort to make sure it do not touch files it not suppose to.
With Gemini I have found some weird issues with code gen that are presumably temperature related. Sometimes it will emit large block of code with a single underscore where it should be a dash or some similar very close match that would make sense as a global decision but is triggered for only that one instance.
Not to mention sneakily functions back in after being told to remove them because they are defined elsewhere. Had a spell where it was reliably a two prompt process for any change, 1) do the actual thing, 2) remove A,B and C which you have reintroduced again.
I have had some very weird issues with Gemini 2.5 Pro where during a longer session it eventually becomes completely confused and starts giving me the response to the previous prompt instead of the current one. I absolutely would not trust it to handle larger amounts of data or information correctly.
This is something i'm currently working on as a commercial solution - the whole codebase sits in a special context window controlled by agents. No need for classic SCM.
For the same reason, I run OpenCode under Mac's sandbox-exec command with some rules to prevent writes to the .git folder or outside of the project (but allowing writes to the .cache and opencode directories).
I use Crystal which archives all my old claude code conversations, I've had to do this a few times when I threw out code that I later realized I needed.
I'm waiting for the day someone builds a wrapper around LLM chats and uses it as a storage medium. It's already been done for GitHub, YouTube videos and Minecraft.
I suppose if you want an extremely lossy storage medium that may or may not retrieve your data, stores less than a 3.5” storage medium, and needs to be continually refreshed as you access it.
Sometimes I notice myself go a bit too long without a commit and get nervous. Even if I'm in a deep flow state, I'd rather `commit -m "wip"` than have to rely on a system not built for version control.
It stands to reason the OP doesn't understand the code or what he's (probably the LLM) has written if he can't manage to reproduce his own results. We have all been there, but this kind of "try stuff" and "not understand the cause and effect" of your changes is a recipe for long-term disaster. Noticeably also is a lack of desire to understand what the actual change was, and reinforcement of bad development practices.
this shit is so depressing, having a "secret sauce" and it being just mystical and unknowable, a magic incantation which you hopefully scribbled down to remember later
I cannot wrap my head around the anecdote that opens the article:
> Lately I’ve heard a lot of stories of AI accidentally deleting entire codebases or wiping production databases.
I simply... I cannot. Someone let a poorly understood AI connected to prod, and it ignored instructions, deleted the database, and tried to hide it. "I will never use this AI again", says this person, but I think he's not going far enough: he (the human) should be banned from production systems as well.
This is like giving full access to production to a new junior dev who barely understands best practices and is still in training. This junior dev is also an extraterrestrial with non-human, poorly understood psychology, selective amnesia and a tendency to hallucinate.
I mean... damn, is this the future of software? Have we lost our senses, and in our newfound vibe-coding passion forgotten all we knew about software engineering?
Please... stop... I'm not saying "no AI", I do use it. But good software practices remain as valid as ever, if not more!
The common story getting shared all over is from a guy named Jason Lemkin. He’s a VC who did a live vibe-coding experiment for a week on Twitter where he wanted to test if he, a non-programmer, could build and run a fake SaaS by himself.
The AI agent dropped the “prod” database, but it wasn’t an actual SaaS company or product with customers. The prod database was filled with synthetic data.
The entire thing was an exercise but the story is getting shared everywhere without the context that it was a vibe coding experiment. Note how none of the hearsay stories can name a company that suffered this fate, just a lot of “I’m hearing a lot of stories” that it happened.
It’s grist for the anti-AI social media (including HN) mill.
I generally agree with you, but I think a lot of people are thinking about Steve Yegge, in addition to Jason Lemkin. And it did lock him out of his real prod database.
Is there a need for a wiki to collect instances of these stories? Having a place to check what claims are actually being made rather than details drift like an urban legend.
I tried to determine the origin of a story about a family being poisoned by mushrooms that an AI said were edible. The nation seemed to change from time to time and I couldn't pin down the original source. I got the feeling it was an imagined possibility from known instances of AI generated mushroom guides.
There seems to cases of warnings of what could happen that change to "This Totally Happened" behind a paywall followed by a lot of "paywalled-site reported this totally happened".
Its a matter of priorities. Its cheap and fast and there is a chance that it will be OK. Even just OK until I move on. People often make risky choices for those reasons. Not just with IT systems - the crash of 2008 was largely the result of people betting (usually correctly) that the wheels would not fall off until after they had collected a few years of bonuses.
No, but the decision is taken on the basis that it probably will not happen, and if it does there is a good chance that the person taking the decision will not be the one to bear the consequences.
That is why I chose to compare it to the 2008 crash. The people who took the decisions to take the risks that lead to it came out of it OK.
OK, if it's a matter of priorities, let's just ignore all hard learned lessons in software engineering, and vibe-code our way through life, crossing fingers and hoping for the best.
Typing systems? Who needs them, the LLM knows better. Different prod, dev, and staging environments? To hell with them, the LLM knows better. Testing? Nope, the LLM told me everything's sweet.
(I know you're not saying this, I'm just venting my frustration. It's like the software engineering world finally and conclusively decided engineering wasn't necessary at all).
I don't think it's the same. I'm not arguing you must not make mistakes, because all of us do.
I mean: if you're a senior, don't connect a poorly understood automated tool to production, give it the means to destroy production, and (knowing they are prone to hallucinations) then tell it "but please don't do it unless I tell you to". As a fun thought experiment, imagine this was Skynet: "please don't start nuclear war with Russia. We have a simulation scenario, please don't confuse it with reality. Anyway, here are the launch codes."
Ignoring all software engineering best practices is a junior-level mistake. If you're a senior, you cannot be let off the hook. This is not the same as tripping on a power cable or accidentally running a DROP in production when you thought you were in testing.
Agreed. Most plausible reason they "can't remember" the good solution is because they were vibe coding and didn't really understand what they were doing. Research mode my ass.
If you're an engineer it can be quite shocking to see how people like the author work. It's much more like science than engineering. A lot of trial and error and swapping things around without fully understanding the implications etc. It doesn't interest me, but it's how all the best results are obtained in ML as far as I can tell.
I'm a scientist and I'd never work that way. I'm methodical, because I've learned it's the fastest and highest-ROI approach.
Guessing without understanding is extremely unlikely to produce the best results in a repeatable manner. It's surprising to me when companies don't know that. For that reason, I generally want to work with experts that understand what they're doing (otherwise is probably a waste of time).
I had a similar anecdotal experience a few weeks ago.
I was working on a blog entry in a VS Code window and I hadn't yet saved it to disk. Then I accidentally hit the close-window keyboard shortcut... and it was gone. The "open last closed window" feature didn't recover it.
On a hunch, I ran some rg searches in my VS Code Library feature on fragments of text I could remember from what I had written... and it turned out there was a VS Code Copilot log file with a bunch of JSON in it that recorded a recent transaction with their backend - and contained the text I had lost.
I grabbed a copy of that file and ran it through my (vibe-coded) JSON string extraction tool https://tools.simonwillison.net/json-string-extractor to get my work back.
Looking back, writing interrupt service routines[1] for DOS as a self-taught teenager has been massively helpful.
Primarly because it taught me to save every other word or so, in case my ISR caused the machine to freeze.
[1]: https://wiki.osdev.org/Interrupt_Service_Routines
There's an option now in vscode to autosave every few seconds.
If you can't remember what you wrote a 'few seconds' ago then you have more problems than having to work in vscode!
"Who needs git when [..]"
No matter how that sentence ends, I weep for our industry.
Reminds me of a colleague back in the day who would force push to main and just leave a "YOLO" comment in the commit.
At my last job, whenever a commit wouldn't build, we would blast it into a slack channel with an alert that included the comment and the name of the developer.
At one job, we had a garish chicken hat that lived in your office if you were the last one to break the build.
This was in the days before automated CI, so a broken commit meant that someone wasn't running the required tests.
Ah yes. Public shaming. The “beatings will continue until morale improves” strategy of code development. Next time, you may want to suggest an evergreen strategy where commits are tested before they’re merged.
‘Works on my local’
No. Evergreen means CI tests your commit, not relying on individuals to be doing before pushing.
It's mind-blowing to me that any multi-user git repo is set up to allow pushes to main at all.
It’s just a joke setup for a funny story
Read the article, the author is definitely in favor of using git.
That's nothing. Compare https://xcancel.com/elonmusk/status/1956583412203958733 :
"The phone/computer will just become an edge node for AI, directly rendering pixels with no real operating system or apps in the traditional sense."
What's ridiculous is that second paragraph:
"There isn’t enough bandwidth to transmit video to all devices from the servers and there won’t always be good connectivity, so there still needs to be significant client-side AI compute."
So no real operating system, except an AI which operates the whole computer including all inputs and outputs? I feel like there's a word for that.
https://www.hypori.com/virtual-workspace-byod
What's the word? "Robot?"
Well, if the AI controls the computer and is how the user interacts with it, I was going to use "Operating System" myself. But that's two words, my bad.
Operating Systems exist to manage access to resources - which requires a very different sort of training than user interface AI's.
Computer chips already use ai/machine learning to guess what the next instructions are going to be. You could have the kernel do similar guessing.
But I don't think those AI's would be the same ones that write love letters.
I think what we'll see is LLM's for people facing things, and more primitive machine learning for resource management (we already have it).
Sorry, I'm partially responding you and partially to this thread in general.
I am just speechless.
I would expect this from a 3rd grader for sure, my friend's sister's book had a photo of ubuntu and windows and operating systems and it won't take me more than 5 minutes to a literal toddler who knows about operating systems from a very high level to explain to them after they know about operating system to teach them why this thing that elon said was the dumbest thing anyone ever said, period.
Is this what free market capitalism does when people at the top are completely incapable of forming basic thoughts that can make sense and not be a logical paradox?
I am thinking more and more after I read your post that maybe these guys won't really miss out on a few billion $ of taxes and other things, nothing can fix the holes in their lives.
Now it makes sense why we have a system which has failed. Why capitalism feels so heartless. So ruthless. Its run by people who don't have a heart or well in this case a brain as well
https://gravitypayments.com/the-gravity-70k-min/ This comes to my mind more and more, why can't companies pay their workers not what they think they can get away with but rather what they think is fair. Why can companies be morally bad for their workers and that's okay and why are people like these running such companies who can't equate 1+1 in the name of AI.
Sounds like the Sun Ray thin client, built by Sun Microsystems in 1999. This was similar to the earlier graphical X terminals, which were reminiscent of mainframe terminals in the 1960s. It's the "wheel of reincarnation".
https://en.wikipedia.org/wiki/Sun_Ray
Super cool! What I am wondering is if there is any interest in lets say having a smartphone that has this tech(see my other comment wishing for a open source phone somewhere on hackernews or the internet really)
So lets say we can just have a really lightweight customizable smartphone which just connects over wifi or wire to something like raspberry pi or any really lightweight/small servers which you can carry around and installing waydroid on it could make a really pleasant device and everything could be completely open source and you can make things modular if you want...
Like, maybe some features like music/basic terminal and some other things can be seen from the device too via linux x and anything else like running android apps calls up the server which you can carry around in a backpack with a powerbank
If I really wanted to make it extremely ideal, the device can have a way of plugging another power bank and then removing the first powerbank while still running the system so that it doesn't shut down and you literally got a genuinely fascinating system that is infinitely modular.
Isn't this sort of what stadia was? But just targeted more for gaming side since Games requires gpu's which are kinda expensive...
What are your thoughts? I know its nothing much but I just want a phone which works for 90% of tasks which lets be honest could be done through a really tiny linux or sun ray as well and well if you need something like an android app running, be prepared for a raspberry pi with a cheap battery running in your pocket. Definitely better than creating systems of mass surveillance but my only nitpick of my own idea would be that it may be hard to secure the communication aspect of it if you use something like wifi but I am pretty sure that we can find the perfect communication method too and it shouldn't be thaaat big of a deal with some modifications right?
I usually stay off X. The number of sycophants in that thread is alarming.
I don't want "apps on demand" that change when the AI training gets updated, and now the AI infers differently than yesterday - I want an app the bank vetted and verified.
yes yes yes we'll just invent the world from whole cloth at 60 fps
Don't take this as career advice!
This is an amusing anecdote. But the only lesson to be learned is to commit early, commit often.
Commit even as a WIP before cleaning up! I don't really like polluting the commit history like that but with some interactive rebase it can be as if the WIP version never existed.
(Side ask to people using Jujutsu: isn't it a use case where jujutsu shines?)
Assuming you squash when you merge the PR (and if you don't, why not?), why even care? Do people actually look through the commit history to review a PR? When I review I'm just looking at the finished product.
I don't typically review commit by commit, but I do really appreciate having good commit messages when I look at the blame later on while reading code.
Which is just the PR message if you squash. To be clear, I'm not advocating for bad messages, but I am saying I don't worry about each commit and focus instead on the quality and presentation of the PR.
Indiscriminate squashing sucks. Atomic commits are great if you want the git history to actually represent a logical changelog for a project, as opposed to a pointless literal keylog of what changes each developer made and when. It will help you if you need to bisect a regression later. It sucks if you bisect and find the change happened in some enormous incohesive commit. Squashing should be done carefully to reform WIP and fix type commits into proper commits that are ready for sharing.
> It sucks if you bisect and find the change happened in some enormous incohesive commit.
But why are any PRs like this? Each PR should represent an atomic action against the codebase - implementing feature 1234, fixing bug 4567. The project's changelog should only be updated at the end of each PR. The fact that I went down the wrong path three times doesn't need to be documented.
> Each PR should represent an atomic action against the codebase
We can bikeshed about this for days. Not every feature can be made in an atomic way.
That's true, some are big and messy, or the change has to be created across a couple of PRs, but I don't think that the answer to "some PRs are messy" is "let's include all the mess". I don't think the job is made easier by having to dig through a half dozen messy commits to find where the bug is as opposed to one or two large ones.
> I don't think that the answer to "some PRs are messy" is "let's include all the mess"
Hey look at us, two alike thinking people! I never said "let's include all the mess".
Looking at the other extreme someone in this thread said they didn't want other people to see the 3 attempts it took to get it right. Sure if it's just a mess (or, since this is 2025, ai slop) squash it away. But in some situations you want to keep a history of the failed attemps. Maybe one of them was actually the better solution but you were just short of making it work, or maybe someone in the future will be able to see that method X didn't work and won't have to find out himself.
I can see the intent, but how often do people look through commit history to learn anything beside "when did this break and why"? If you want lessons learned put it in a wiki or a special branch.
Main should be a clear, concise log of changes. It's already hard enough to parse code and it's made even harder by then parsing versions throughout the code's history, we should try to minimize the cognitive load required to track the number of times something is added and then immediately removed because there's going to be enough of that already in the finished merges.
> If you want lessons learned put it in a wiki or a special branch.
You already have the information in a commit. Moving that to another database like a wiki or markdown file is work and it is lossy. If you create branches to archive history you end up with branches that stick around indefinitely which I think most would feel is worse.
> Main should be a clear, concise log of changes.
No, that's what a changelog is for.
You can already view a range of commits as one diff in git. You don't need to squash them in the history to do that.
I am beginning to think that the people who advocate for squashing everything have `git commit` bound to ctrl+s and smash that every couple minutes with an auto-generated commit message. The characterization that commits are necessarily messy and need to be squashed as to "minimize the cognitive load" is just not my experience.
Nobody who advocates for squashing even talks about how they reason about squashing the commit messages. Like it doesn't come into their calculation. Why is that? My guess is, they don't write commit messages. And that's a big reason why they think that commits have high "cognitive load".
Some of my commit messages are longer than the code diffs. Other times, the code diffs are substantial and there are is a paragraph or three explaining it in the commit message.
Having to squash commits with paragraphs of commit messages always loses resolution and specificity. It removes context and creates more work for me to try to figure out how to squash it in a way where the messages can be understood with the context removed by the squash. I don't know why you would do that to yourself?
If you have a totally different workflow where your commits are not deliberate, then maybe squashing every merge as a matter of policy makes sense there. But don't advocate that as a general rule for everyone.
Commits aren't necessarily messy, but they're also not supposed to be necessarily clean. There's clearly two different work flows here.
It seems some people treat every commit like it's its own little tidy PR, when others do not. For me, a commit is a place to save my WIP when I'm context switching, or to create a save point when I know my code works so that I can revert back to that if something goes awry during refactoring, it's a step on the way to completing my task. The PR is the final product to be reviewed, it's where you get the explanation. The commits are imperfect steps along the way.
For others, every commit is the equivalent of a PR. To me that doesn't make a lot of sense - now the PR isn't an (ideal world) atomic update leading to a single goal, it's a digest of changes, some of which require paragraphs of explanation to understand the reasoning behind. What happens if you realize that your last commit was the incorrect approach? Are you constantly rebasing? Is that the reader's problem? Sure, that happens with PRs as well, but again, that's the difference in process - raising a PR requires a much higher standard of completion than a commit.
it is not a philosophical debate against the auto-squash. It is just auto-squash deletes potentially useful data automatically and provides zero benefit? Or what is the benefit?
1. pr message doesn't contain _all_ intent behind each change while the commit did cover the intent behind a technical decision. would you put everything in the pr message? why? it will just be a misleading messy comment for unrelated architectural components.
2. you branch from a wip branch - why? just. - now when the original branch is merged you can't rebase as github/lab messed up the parents.
3. interactive rebase before merge works fine for the wip wip wip style coding. But honestly wip wip wip style happens on bad days, not on days where you are executing a plan.
4. most importantly: you can read the pr message stuff if you filter on merge commits, but if you auto-squash you lose information.
> The commits are imperfect steps along the way.
The workflow you're describing here is fine for staging or stashing or commits that you don't intend to publish. I'll sometimes commit something unfinished, then make changes and either stash those if it doesn't work out, or commit with --amend (and sometimes --patch) to make a single commit that is clean and coherent. The commits aren't always small, but they're meaningful and it's easier to do this along the way than at the very end when it's not so clear or easy to remember all the details from commits that you made days ago.
> It seems some people treat every commit like it's its own little tidy PR
Pull requests are not always "little". But I'm nitpicking.
It sounds like a big difference between the workflows is that you don't amend or stash anything locally along the way. I guess I find it easier and more valuable to edit and squash changes locally into commits before publishing them. Instead of at the very end as a single pull request. For me, I can easily document my process with good commit messages when everything is fresh in my mind.
The end result is that some commits can be pretty big. Sometimes there is no good opportunity to break them down along the way. That is part of the job. But the end result is that these problems concerning a messy history shouldn't be there. The commits I write should be written with the intent of being meaningful for those who might read it. And it's far easier to do that along the way then at the end when it's being merged. So it's difficult to understand some of the complaints people make when they say it's confusing or messy or whatever.
Even if requirements change along the way while the pull request is open, and more commits are added as a response. I've just never had a situation come up where I'm blaming code or something and looking back at the commit history and struggling to understand what was going on in a way where it would be more clear if it had been squashed. Because the commits are already clean and, again, it's just easier to do this along the way.
But again, I use `git commit --verbose --amend --patch` liberally to publish commits intentionally. And that's why it feels like a bit of busywork and violence when people advocate for squashing those.
You say "two different work flows here" and I think perhaps a better way of considering this is as having multiple _kinds_ of history.
Most of the time, I don't have a clean path through a piece of work such that I can split it out into beautifully concise commits with perfect commit messages. I have WIP commits, messy changes, bad variable names, mistakes, corrections, real corrections, things that I expect everyone does. I commit them every one of them. This is my "private history" or my scratch work.
After I've gotten to the end and I'm satisfied that I've proven my change does what its supposed to do (i.e. tests demonstrate the code), I can now think about how I would communicate that change to someone else.
When I in this stage, it sometimes leads to updating of names now that I'm focussing on communicating my intention. But I figure out how to explain the end result in broad strokes, and subdivide where necessary.
From there, I build "public history" (leveraging all the git tricks). This yields pieces that are digestible and briefly illuminated with a commit message. Some pieces are easy to review at a glance; some take some focus; some are large; some are small.
But key is that the public history is digestible. You can have large, simple changes (e.g. re-namings, package changes) that, pulled out as a separate commit, can be reviewed by inspection. You can have small changes that take focus to understand, but are easily called out for careful attention in a single commit (and divorced from other chaff).
By having these two sorts of history, I can develop _fearlessly_. I don't care what the history looks like as I'm doing it, because I have the power (through my use of git) to construct a beautiful exposition of development that is coherent.
A PR being messy and cluttered is a choice. History can be _easily_ clarified. Anyone who uses git effectively should be able to take a moment to present their work more like framed artwork, and not a finger-painted mess stuck to the refrigerator.
I can see your point and sometimes I myself include PoC code as commented out block that I clean up in a next PR incase it proves to be useful.
But the fact is your complete PR commit history gives most people a headache unless it's multiple important fixes in one PR for conveniency's sake. Happens at least for me very rarely. Important things should be documented in say a separate markdown file.
This simply isn’t true unless you have to put everything in one commit?
To be honest, I usually get this with people who have never realized that you can merge dead code (code that is never called). You can basically merge an entire feature this way, with the last PR “turning it on” or adding a feature flag — optionally removing the old code at this point as well.
So maintaining old and new code for X amounts of time? That sounds acceptable in some limited cases, and terrible in many others. If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.
My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.
> So maintaining old and new code for X amounts of time?
No more than normal? Generally speaking, the author working on the feature is the only one who’s working on the new code, right? The whole team can see it, but generally isn’t using it.
> If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.
If you have people good at what they do ... maybe. I’ve seen this end very badly due to merge artefacts, so I wouldn’t recommend doing any merges, but rebasing instead. In any case, you can always copy a function to another function: do_something_v2(). Then after you remove the v1, remove the v2 prefix. It isn’t rocket science.
> My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.
I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes. The only thing I can think of is your own company’s policies in relation to those regulations; in which case, you can change your own policies.
Medical industry, code that gets shipped has to be documented, even if it's not used. It doesn't mean we can't ship unused code, it just means it's generally a pretty bad idea to do it. Maybe the feature's requirement might change during implementation, or you wanted to do a minor version release but that dead code is for a feature that needs to go into a major version (because of regulations).
> I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes
https://blog.johner-institute.com/regulatory-affairs/design-...
Our regulatory compliance regime hates it when we run non-main branches in production and specifically requires us to use feature flagging in order to delay rollouts of new code paths to higher-risk markets. YMMV.
yes, that would be ideal. especially in a world with infrastructure tied so closely to the application this standard cannot always be met for many teams.
Yeah "should" is often not reality, BUT I'm arguing that not squashing doesn't make things better.
I so miss bazaar's UI around merges/commits/branches. I feel like most of the push for squashing is a result of people trying to work around git's poor UI here.
Alternative to squashing is not a beautiful atomic commits. It is series of commits where commit #5 fixes commit #2 and intruduces bug to be fixed on commit #7. Where commit #3 introduces new class that is going to be removed in commits #6 and #7.
Yeah, I don't see the value in looking through that. At best I'll solve the problem, commit because the code works now, create unit tests, commit them, and then refactor one or both in another commit. That first commit is just ugly and that second holds no additional information that the end product won't have.
It is often easier to review commit-by-commit, provided of course that the developer made atomic commits that make sense on their own.
I feel like that requires a lot of coordination that I, in the midst of development, don't necessarily have. Taking my WIP and trying to build a story around it at each step requires a lot of additional effort, but I can see how that would be useful for the reviewer.
We can agree that we don't need those additional steps once the PR is merged, though, right?
I have literally never met a developer who does this (including myself). 99% of all PRs I have ever created or reviewed consist of a single commit that "does the thing" and N commits that fix issues with/debug failure modes of the initial commit.
Yeah, make it work. Commit. Build unit test. Commit. Fix bugs. Commit. Make pretty. Commit and raise a PR.
You never design a solution which needs multiple architectural components which _support_ the feature? I do, and would make little sense to merge them as separate PRs most of the time as that would mean sometimes tests written on the inappropriate level, also a lot more coordination and needs a lot more elaborate description then just explain how the set of components work in tandem to provide the user value.
> (Side ask to people using Jujutsu: isn't it a use case where jujutsu shines?)
Yes! For the case discussed in the article, I actually just wrote a comment yesterday on lobsters about the 'evolog': https://lobste.rs/s/xmlpu8/saving_my_commit_with_jj_evolog#c...
Basically, jj will give you a checkpoint every time you run a jj command, or if you set up file watching, every time a file changes. This means you could recover this sort of thing, assuming you'd either run a commend in the meantime or had turned that on.
Beyond that, it is true in my experience that jj makes it super easy to commit early, commit often, and clean things up afterwards, so even though I was a fan of doing that in git, I do it even more with jj.
I assume Jujutsu only commits the file when you use one of the jj commands. I don't think it keeps a daemon running and checking for changes in the files.
It does the former by default, and the latter if you configure it.
I have heard of jj. I have tried jj, I love jj but I couldn't get myself towards using it.
This itself seems to me the thing which will make me push towards jj.
So if I am correct, you are telling me that I can have jj where I can then write anything in the project and it can sort of automatically record it to jj and afterwards by just learning some more about jj, I can then use that history to create a sane method for me to create git commits and do other thing without having to worry too much.
Like I like git but it scares me a little bit, having too many git commits would scare me even further but I would love to use jj if it can make things less scary
Like what would be the command / exact workflow which I am asking in jj and just any details since I am so curious about it. I have also suffered so much of accidentally deleting files or looking through chat logs if I was copy pasting from chatgpt for some one off scripts and wishing for a history of my file but not wanting git everytime since it would be more friction than not of sorts...
> I can then use that history to create a sane method for me to create git commits and do other thing without having to worry too much.
It's easier than that. Your jj commits are the commits that will be pushed - not all the individual git commits.
Conceptually, think of two types of commits: jj and git. When you do `jj new`, you are creating a jj commit.[1] While working on this, every time you run a command like `jj status`, it will make a git commit, without changing the jj commit. When you're done with the feature and type `jj new` again, you now have two jj commits, and many, many git commits.[2] When you do a `jj git push`, it will send the jj commits, without all the messiness of the git commits.
Technically, the above is inaccurate. It's all git commits anyway. However, jj lets you distinguish between the two types of commits: I call them coarse and fine grained commits. Or you can think hierarchically: Each jj commit has its own git repository to track the changes while you worked on the feature.[2]
So no, you don't need to intentionally use that history to create git commits. jj should handle it all for you.
I think you should go back to it and play some more :-)
[1] changeset, whatever you want to call it.
[2] Again - inaccurate, but useful.
I can’t see myself going back to git after I actually went back and was very confused for a second I need to stash before rebase.
I always commit when wrapping up the day. I add [WIP] in the subject, and add "NOTE: This commit doesn't build" if it's in a very half-baked state.
I do a bunch of context switching, and I commit every time I switch as stashing would be miserable. I never expect those WIP commits to reviewed and it'd be madness to try.
same - my eod commits are always titled 'checkpoint commit: <whatever>' and push to remote. Then before the MR is made (or turned from draft to final) I squash the checkpoint commits - gives me a psychological feeling of safety more than anything else imo
Git is a distributed version control system. You can do whatever you like locally and it won't "pollute" anything. Just don't merge that stuff into shared branches.
I automatically commit every time my editor (emacs) saves a file and I've been doing this for years (magit-wip). Nobody should be afraid of doing this!
Honest question - What DO you merge into shared branches? And, when your local needs to "catch up", don't you have to pull in those shared commits which conflict with your magit-wip commits because they touch the same code, but are different commit hashes?
Feature branches that have been cleaned up and peer-reviewed/CI-tested, at least in the last few places I worked.
Every so often this still means that devs working on a feature will need to rebase back on the latest version of the shared branch, but if your code is reasonably modular and your project management doesn't have people overlapping too much this shouldn't be terribly painful.
Exactly this. I can make a hundred commits that are one file per commit and I can later go back and
and that will cleanly leave it as the hundred commits never happened.I think maybe one other lesson, although I certainly agree with yours, and with the other commenters who talk about the unreliability of this particular method? This feels like an argument for using an editor that autosaves history. "Disk is cheap," as they say -- so what if your undo buffer for any given file goes back seven days, or a month? With a good interface to browse through the history?
I'm sure there's an emacs module for this.
Jetbrain's local history has saved my bacon several times. Its a good use of all that unused disc space.
My editor syncs the total editing state to disk every few minutes, across all open instances.
I eventually added support for killing buffers, but I rarely do (only if there's stuff I need to purge for e.g. liability reasons). After a few years use, I now have 5726 buffers open (just checked).
I guess I should garbage collect this at some point and/or at least migrate it from the structure that gets loaded on every startup (it's client-server, so this only happens on reboot, pretty much), but my RAM has grown many times faster than my open buffers.
Most of Google's internal development happens on a filesystem that saves absolutely everything in perpetuity. If you save it, the snapshot lives on forever--deleting stuff requires filing a ticket with justification. It is amazingly useful and has saved me many times. Version control is separate, but built on it.
When I eventually move on, I will likely find or implement something similar. It is just so useful.
I used to rely on this on the old DEC systems, when editing and saving foo.dat;3 gave you foo.dat;4. It didn't save everything forever - and you could PURGE older versions - but it saved enough to get me out of trouble many times.
And in general maybe have a better working methodic, or however you name it. Sounds like messing around to me.
I like to mess around. Some of my best work comes out of messing around. The trick is making sure you mess around in a way that lets you easily hold onto whatever improvements you make. For me that means committing obsessively.
And branches are free.
Emergency Building Fire Procedure:
1. Commit 2. Push 3. Evacuate
if you like it, then you shoulda git commit on it
do it even if you don't like it :)
hopefully most folks understand that its tongue-in-cheek.
with that said its true that it works =)
A fun anecdote, and I assume it's tongue in cheek, although you never know these days, but is the LLM guaranteed to give you back an uncorrupted version of the file? A lossy version control system seems to me to be only marginally better than having no VCS at all.
I frequently (basically every conversation) have issues with Claude getting confused about which version of the file it should be building on. Usually what causes it is asking it do something, then manually editing the file to remove or change something myself and giving it back, telling it it should build on top of what I just gave it. It usually takes three or four tries before it will actually use what I just gave it, and from then on it keeps randomly trying to reintroduce what I deleted.
Your changes aren’t being introduced to its context, that’s why.
The models definitely can get confused if they have multiple copies in their history though, regardless of whether your latest changes are in.
From experience, no. I’ve customized my agent instructions to explicitly forbid operations that involve one-shot rewriting code for exactly this reason. It will typically make subtle changes, some of which have had introduced logic errors or regressions.
When I used toolcalls with uuids in the name, tiny models like quantized qwen3-0.6B would occasionally get some digits in the UUID wrong. Rarely, but often enough to notice even without automation. Larger models are much better, but give them enough text and they also make mistakes transcribing it
I dont know where Gemini stores the context, but if I’m using a local LLM client app, that context is on my machine verbatim.
If you ask the LLM to give you back that context does it give back to you verbatim?
statistically, maybe.
Stochastically correct is the best sort of correct?
Well, it'll give you what the tokenizer generated. This is often close enough for working software, but not exact. I notice it when asking claude for the line number of the code with specific implementation. It'll often be off by a few because of the way it tokenizes white space.
Thanks. I noticed the same thing about line numbers but I didn’t know the reason. It has made me double check I’m in the right file more than once.
I'd say it's more likely guaranteed to give you back a corrupted version of the file.
I assume OP was lucky because the initial file seems like it was at the very start of the context window, but if it had been at the end it would have returned a completely hallucinated mess.
If it’s in the context window … it’s sitting around as plain text. I guess asking is easier than scrollback?
Indeed. OP, nothing is "in" an LLM's context window at rest. The old version of your file is just cached in whatever file stores your IDE's chat logs, and this is an expensive way of retrieving what's already on your computer.
I mean there is the chance it's on someone else's computer ^W^W^W^ the cloud, and his provider of choice doesn't offer easy access to deep scrollback ... which means this is only inefficient, not inefficient and pointless.
Technically it doesn't have to be since that part of the context window would have been in the KV cache and the inference provider could have thrown away the textual input.
possible - but KV caches are generally _much_ bigger than the source text and can be reproduced from the source text so it wouldn't make a lot of sense to throw it out
Using LLMs as an extra backup buffer sounds really neat.
But I was trained with editing files in telnet over shaky connections, before Vim had auto-backup. You learn to hit save very frequently. After 3 decades I still reflexively hit save when I don’t need to.
I don’t forget to stage/commit in git between my prompts.
The new checkpoint and rollback features seem neat for people who don’t have those already. But they’re standard tools.
This isn't actually a Gemini - a copy of the file was already stored locally by Cursor. Most modern editors, including VS Code, can recover files from local history without needing Git.
The interesting thing here is the common misconception that LLMs maintain internal state between sessions which obviously they don't. They don't have memory and they don't know about your files.
I've had this exact thing happen, but with the LLM deciding to screw up code it previously wrote. I really love how Jujutsu commits every time I run a "jj status" (or even automatically, when anything changes), it makes it really easy to roll back to any point.
Intern I work with, got something working, but was not saved anywhere. No git No email to others in the project (That is how they work)
He complained to me that he "could not find it in ChatGPT history as well"
I think @alexmolas was lucky
Like the author, I've also found myself wanting to recover an accidentally deleted file. Luckily, some git operations, like `git add` and `git stash`, store files in the repo, even if they're not ultimately committed. Eventually, those files will be garbage collected, but they can stick around for some time.
Git doesn't expose tools to easily search for these files, but I was able to recover the file I deleted by using libgit2 to enumerate all the blobs in the repo, search them for a known string, and dump the contents of matching blobs.
If you sent the python file to Gemini, wouldn't it be in your database for the chat? I don't think relying on uncertain context window is even needed here!
A big goal while developing Yggdrasil was for it to act as long term documentation for scenarios like you describe!
As LLM use increases, I imagine each dev generating so much more data than before, our plans, considerations, knowledge have almost been moved partially into the LLM's we use!
You can check out my project on git, still in early and active development - https://github.com/zayr0-9/Yggdrasil
I confused it with this for a min, which I have played with: https://github.com/yggdrasil-network/yggdrasil-go
Browsing that repo is a bit trippy after being used to my own all this time haha.
I would have pressed Ctrl-Z in my editor like mad until I got the file. If I was using vim I could even grep for it through my history files, thanks to vim-persisted-undo.
Jetbrains IDEs have 'Local History' that keeps a record of all edits to files - I looked in mine now and I have changes from more than 3w ago :)
>give me the exact original file of ml_ltv_training.py i passed you in the first message
I don't get this kind of thinking. Granted I'm not a specialist in ML. Is the temperature always 0 or something for these code-focused LLMs? How are people so sure the AI didn't flip a single 0 to 1 in the diff?
Even more so when applied to other more critical industries, like medicine. I talked to someone who developed an AI-powered patient report summary or something like that. How can the doctor trust that AI didn't alter or make something up? Even a tiny, single digit mistake can be quite literally fatal.
This is just junior level developer thinking. There's so much that this developer doesn't know that they don't know.
> I refactored all the sketchy code into a clean Python package, added tests, formatted everything nicely, added type hints, and got it ready for production.
The fact that type hints are the last in the list, not first, suggests the level of experience with the language
I'm not sure it's the context window. Gemini cli keeps a shadow git repo to be able to rollback changes in cases like this: https://gemini-cli.xyz/docs/en/checkpointing.
It's disabled by default, but even with the default setups, you can find large snippets of code in ~/.gemini/tmp.
tl;dr: Gemini cli saves a lot of data outside the context window that enables rollback.
I'm sure other agents do the same, I only happen to know about Gemini because I've looked at the source code and was thinking of designing my own version of the shadow repo before I realized it already existed.
Most IDEs have Local History built-in to retrieve any version of a file, including untracked deleted files: VSCode Timeline, all JetBrains, Eclipse...
Are people coding on notepad ?
Jetbrains local history, stacked clipboard, and recent locations (all searchable) are such a massive developer experience boost.
in a console talking to your ai friend
Can someone come up with a conversion factor for context window sizes to glacier depletion?
Over the years, I've heard so many stories like these without happy ending - developers wasting days and sometimes even a week or two of work, because they do like to commit and use git often - that my long-time upheld practice is to pretty much always create feature/develop branches and commit as often as possible, often multiple times per hour.
It has nothing to do with context window . Its cursor stores locally gigabytes of data including your requests and answers. It’s a classical rag , not a “long context”
I find git is just about the only thing you need to lock down when using AI. Don't let it mess with your git, but let it do whatever else it wants. Git is then a simple way to get a summary of what was edited.
> Who needs git best practices when you have an LLM that remembers everything?
OK, but if you'd used git properly, you wouldn't have had this problem in the first place.
I would recommend the Code Supernova model in Cursor if you want a 1M token context window. It's free right now since the model is being tested in stealth, but your data will be used by XAI or whoever it turns out the model creator is.
1M context is amazing, but even after 100k tokens Gemini 2.5 Pro is usually incapable of consistently reproducing 300 LOC file without changing something in process. And it actually take a lot of effort to make sure it do not touch files it not suppose to.
With Gemini I have found some weird issues with code gen that are presumably temperature related. Sometimes it will emit large block of code with a single underscore where it should be a dash or some similar very close match that would make sense as a global decision but is triggered for only that one instance.
like code containing the same identifier.
Not to mention sneakily functions back in after being told to remove them because they are defined elsewhere. Had a spell where it was reliably a two prompt process for any change, 1) do the actual thing, 2) remove A,B and C which you have reintroduced again.I have had some very weird issues with Gemini 2.5 Pro where during a longer session it eventually becomes completely confused and starts giving me the response to the previous prompt instead of the current one. I absolutely would not trust it to handle larger amounts of data or information correctly.
Exactly, 1M context tokens is marketing, relatively little training was done at that input size.
This is something i'm currently working on as a commercial solution - the whole codebase sits in a special context window controlled by agents. No need for classic SCM.
I’m completely paranoid about claude messing with my .git folder so I push regularly
For the same reason, I run OpenCode under Mac's sandbox-exec command with some rules to prevent writes to the .git folder or outside of the project (but allowing writes to the .cache and opencode directories).
sandbox-exec -p "(version 1)(allow default)(deny file-write* (subpath \"$HOME\"))(allow file-write* (subpath \"$PWD\") (subpath \"$HOME/.local/share/opencode\"))(deny file-write* (subpath \"$PWD/.git\"))(allow file-write* (subpath \"$HOME/.cache\"))" /opt/homebrew/bin/opencode
Who needs anything when you can keep everything in a 16 TBs txt file?
There's probably a meme in here somewhere about how we wrapped your JSON in our JSON so we can parse JSON to help you parse your JSON.
I use Crystal which archives all my old claude code conversations, I've had to do this a few times when I threw out code that I later realized I needed.
Claude Code archives Claude Code conversations. "claude -r" - or go to equivalent of "~/.claude/sessions" or something like that
What does this have to do with 1M context windows… That’s just Cursor keeping your old file around.
I'm waiting for the day someone builds a wrapper around LLM chats and uses it as a storage medium. It's already been done for GitHub, YouTube videos and Minecraft.
I suppose if you want an extremely lossy storage medium that may or may not retrieve your data, stores less than a 3.5” storage medium, and needs to be continually refreshed as you access it.
For Minecraft did they just write text in a book item?
Living life on the edge, huh?
Sometimes I notice myself go a bit too long without a commit and get nervous. Even if I'm in a deep flow state, I'd rather `commit -m "wip"` than have to rely on a system not built for version control.
The more I think about it, the more 1M context windows are available while the system has low usage.
That’s a lot of words for “I suck at my job”
It stands to reason the OP doesn't understand the code or what he's (probably the LLM) has written if he can't manage to reproduce his own results. We have all been there, but this kind of "try stuff" and "not understand the cause and effect" of your changes is a recipe for long-term disaster. Noticeably also is a lack of desire to understand what the actual change was, and reinforcement of bad development practices.
I do.
this shit is so depressing, having a "secret sauce" and it being just mystical and unknowable, a magic incantation which you hopefully scribbled down to remember later
This article is nonsense.
Lol. The context window is actually just a buffer in your client. The guy could probably simply scroll up...
I cannot wrap my head around the anecdote that opens the article:
> Lately I’ve heard a lot of stories of AI accidentally deleting entire codebases or wiping production databases.
I simply... I cannot. Someone let a poorly understood AI connected to prod, and it ignored instructions, deleted the database, and tried to hide it. "I will never use this AI again", says this person, but I think he's not going far enough: he (the human) should be banned from production systems as well.
This is like giving full access to production to a new junior dev who barely understands best practices and is still in training. This junior dev is also an extraterrestrial with non-human, poorly understood psychology, selective amnesia and a tendency to hallucinate.
I mean... damn, is this the future of software? Have we lost our senses, and in our newfound vibe-coding passion forgotten all we knew about software engineering?
Please... stop... I'm not saying "no AI", I do use it. But good software practices remain as valid as ever, if not more!
The common story getting shared all over is from a guy named Jason Lemkin. He’s a VC who did a live vibe-coding experiment for a week on Twitter where he wanted to test if he, a non-programmer, could build and run a fake SaaS by himself.
The AI agent dropped the “prod” database, but it wasn’t an actual SaaS company or product with customers. The prod database was filled with synthetic data.
The entire thing was an exercise but the story is getting shared everywhere without the context that it was a vibe coding experiment. Note how none of the hearsay stories can name a company that suffered this fate, just a lot of “I’m hearing a lot of stories” that it happened.
It’s grist for the anti-AI social media (including HN) mill.
I generally agree with you, but I think a lot of people are thinking about Steve Yegge, in addition to Jason Lemkin. And it did lock him out of his real prod database.
Claude has dropped my dev database about three times at this point. I can totally see how it would drop a prod one if connected to it.
Is there a need for a wiki to collect instances of these stories? Having a place to check what claims are actually being made rather than details drift like an urban legend.
I tried to determine the origin of a story about a family being poisoned by mushrooms that an AI said were edible. The nation seemed to change from time to time and I couldn't pin down the original source. I got the feeling it was an imagined possibility from known instances of AI generated mushroom guides.
There seems to cases of warnings of what could happen that change to "This Totally Happened" behind a paywall followed by a lot of "paywalled-site reported this totally happened".
OK, ok, I read the Twitter posts and didn't get the full context that this was an experiment.
I'm actually relieved that nobody (currently) thinks this was a good idea.
You've restored my faith in humanity. For now.
Its a matter of priorities. Its cheap and fast and there is a chance that it will be OK. Even just OK until I move on. People often make risky choices for those reasons. Not just with IT systems - the crash of 2008 was largely the result of people betting (usually correctly) that the wheels would not fall off until after they had collected a few years of bonuses.
I do not know who is doing the math, but deleting production data does not sound very cheap to me...
No, but the decision is taken on the basis that it probably will not happen, and if it does there is a good chance that the person taking the decision will not be the one to bear the consequences.
That is why I chose to compare it to the 2008 crash. The people who took the decisions to take the risks that lead to it came out of it OK.
OK, if it's a matter of priorities, let's just ignore all hard learned lessons in software engineering, and vibe-code our way through life, crossing fingers and hoping for the best.
Typing systems? Who needs them, the LLM knows better. Different prod, dev, and staging environments? To hell with them, the LLM knows better. Testing? Nope, the LLM told me everything's sweet.
(I know you're not saying this, I'm just venting my frustration. It's like the software engineering world finally and conclusively decided engineering wasn't necessary at all).
99% agree, but:
>(the human) should be banned from production systems as well.
The human may have learnt the lesson... if not, I would still be banned ;)[0]
[0] I did not delete a database, but cut power to the rack running the DB
I cut the power... but I did not drop the database.
I don't think it's the same. I'm not arguing you must not make mistakes, because all of us do.
I mean: if you're a senior, don't connect a poorly understood automated tool to production, give it the means to destroy production, and (knowing they are prone to hallucinations) then tell it "but please don't do it unless I tell you to". As a fun thought experiment, imagine this was Skynet: "please don't start nuclear war with Russia. We have a simulation scenario, please don't confuse it with reality. Anyway, here are the launch codes."
Ignoring all software engineering best practices is a junior-level mistake. If you're a senior, you cannot be let off the hook. This is not the same as tripping on a power cable or accidentally running a DROP in production when you thought you were in testing.
I mean everyone breaks prid at least once, ai is just one that doesn’t learnt from the mistake
Ok great, now u have retrieved that code that you dont even understand and is completely unmaintainable.
Agreed. Most plausible reason they "can't remember" the good solution is because they were vibe coding and didn't really understand what they were doing. Research mode my ass.
If you're an engineer it can be quite shocking to see how people like the author work. It's much more like science than engineering. A lot of trial and error and swapping things around without fully understanding the implications etc. It doesn't interest me, but it's how all the best results are obtained in ML as far as I can tell.
I'm a scientist and I'd never work that way. I'm methodical, because I've learned it's the fastest and highest-ROI approach.
Guessing without understanding is extremely unlikely to produce the best results in a repeatable manner. It's surprising to me when companies don't know that. For that reason, I generally want to work with experts that understand what they're doing (otherwise is probably a waste of time).