Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Testing works because tests are (essentially) a second, crappy implementation of your software. Tests only pass if both implementations of your software behave the same way. Usually that will only happen if the test and the code are both correct. Imagine if your code (without tests) has a 5% defect rate. And the tests have a 5% defect rate (with 100% test coverage). Then ideally, you will have a 5%^2 defect rate after fixing all the bugs. Which is 0.25%.

The price you pay for tests is that they need to be written and maintained. Writing and maintaining code is much more expensive than people think.

Or at least it used to be. Writing code with claude code is essentially free. But the defect rate has gone up. This makes TDD a better value proposition than ever.

TDD is also great because claude can fix bugs autonomously when it has a clear failing test case. A few weeks ago I used claude code and experts to write a big 300+ conformance test suite for JMAP. (JMAP is a protocol for email). For fun, I asked claude to implement a simple JMAP-only mail server in rust. Then I ran the test suite against claude's output. Something like 100 of the tests failed. Then I asked claude to fix all the bugs found by the test suite. It took about 45 minutes, but now the conformance test suite fully passes. I didn't need to prompt claude at all during that time. This style of TDD is a very human-time efficient way to work with an LLM.



I think there is a difference whether you do TDD or write tests after the fact to avoid regression. TDD can only work decently if you already know your specs very well, but not so much when you still need to figure them out, and need to build something actual to be able to figure it out.


Yes; I think this remains true with coding agents. If you need to do some exploration of the solution space, it makes sense to do that before writing tests. Once you have a clear, workable design, you can get the agent to make a battery of tests to make sure the final product works correctly.


This is great. The tests in this case are the spec. When you give the agent something concrete to fail against, it knows what done looks like.

The problem is if you skip that step and ask Claude to write the tests after.


  > Tests only pass if both implementations of your software behave the same way.
That's not true.

I even addressed this in my comment as did Dijkstra


What is untrue about this statement you quoted?


You can have software behave differently while passing the same tests.

Idk man, this is pretty easy to demonstrate. Start with a trivial example: test is that input (2,2) -> 4. Function 1 does multiplication, function 2 does exponentiation. Both functions pass the test.

Sure, simple example but illustrative examples should be simple. But add more complexity and I'll add more examples of functions where the outputs are the same for a given set of inputs. (There's a whole area of mathematics dedicated to this!) It's simple, but you also confidently claimed something that was trivial to disprove.

Your claim is true if and only if your tests have complete coverage. So, your claim is only true if you've done formal verification of your code. Which was what I said in the beginning and is what Dijkstra claimed as well.


I mean, yeah, I thought that was obvious. If you want to be a pedant:

> Tests only pass if both implementations of your software behave the same way in the exact area being tested.

As I said in my comment above. Tests are a crappy second implementation. The test in your example isn’t even defined outside the input range of (2,2). Tests are a stochastic tool. Tests can prove the presence of a bug, not their absence. Completeness isn’t something tests alone can provide. But in the choice between yolo coding and yolo coding plus tests, you’re obviously going to get fewer bugs with tests.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: