I think the author is taking general advice and applying it to a niche situation.
> So by violating the first rule of clean code — which is one of its central tenants — we are able to drop from 35 cycles per shape to 24 cycles per shape
Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
But most of us aren't doing that. Most developers are doing work where the biggest problem is adding the next umpteenth features that Product has planned (but hasn't told us about yet). Clean code optimizes for improving time-to-market for those features, and not for the CPU doing less work.
100%, I’ve done tonnes of (backend) performance optimization, profiling, etc. on higher level applications, and the perf bottlenecks have never been any of the things discussed in this article. It’s normally things like:
- Slow DB queries
- Lack of concurrency/parallelism
- Lack of caching/memoization for some expensive thing that could be cached
- Excessive serialization/deserialization (things like ORMs that create massive in memory objects)
- GC tuning/not enough memory
- Programmer doing something dumb, like using an array when they should be using a set (and then doing a huge number of membership checks)
With that being said, I have worked on the odd performance optimization where we had to get quite low level. For example, when working on vehicle routing problems, they’re super computationally heavy, need to be optimized like crazy and the hot spots can indeed involve pretty low level optimizations. But it’s been rare in the work I’ve done.
This article is probably meaningful for people who work on databases, games, OSes, etc., but for most devs/apps these tips will yield zero noticeable performance improvements. Just write code in a way you find clean/maintainable/readable, and when you have perf issues, profile them and ship the appropriate fix.
Casey Muratori knows a lot about optimizing performance in game engines. He then assumes that all other software must be slow because of the exact same problems.
I think the core problem here is that he assumes that everything is inside a tight loop, because in a game engine that's rendering 60+ times a second (and probably running physics etc at a higher rate than that) that's almost always true.
Also the fact that his example of what "everyone" supposedly calls "clean code" looks like some contrived textbook example from 20 years ago strains his credibility.
Edit: come to think of it, the only person I know of who actually uses the phrase "clean code" as if it's some kind of concrete thing with actual rules is Uncle Bob. Is Casey assuming the entire commercial software industry === Uncle Bob? It's like he talked to one enterprise java dev like 10 years ago and based his opinion of the entire industry on them.
The thing that sets him off is that he is using a computer with enormous computing power and everything is slow.
He does have a narrow view, but it does not make his claims invalid.
I liked that his POC terminal made in anger made the Windows Terminal faster. But even in that context it was clear that by making some tradeoffs - which the Windows Terminal team can not make (99.99% of users do not run into the issue, but Windows has to support everything) - it could be even a lot faster.
So we live in a world where we cater for the many 1% use cases, which do not overlap, but slows down everyone.
Many gamedevs do their own tools, because they are fed up how slow iteration is. The same thing is happening at bigger companies, at some point productivity start to matter and off the shelf solutions start to fail.
> The same thing is happening at bigger companies, at some point productivity start to matter and off the shelf solutions start to fail
This is perhaps the third time I've posted this on HN, but what you describe is the circle of life for widely-used software projects. Large tech companies are not immune to it, resulting in frequent component rewrites, deprecations and almost-drop-in replacements that shuffle complexity up or down the stack.
Step 1: Developer is fed up by how slow/bloated current incumbent is, so they write a fast, lean and mean project that solves their problems
Step 2: The project becomes popular on its merits, rakes in stars on Github as people discover how awesome it is
Step 3: Users start discovering limitations for their use cases, issues and pull requests pour in
Step 4: Thousands of PRs later, the project is usable by most people and has "won". It is now the incumbent, but no longer is as fast as it once was, but it also ships functionality catering to many niche needs
I've started a project that has the potential to go down this path but have a very strict 'implement your own diffs if you want them' for this particular project.
Like, feel free to fork away if you like. The core repository needs to be simple and stay true to its goals, and when it updates everyone downstream can update if they want to do that to themselves. But for what it is and does, maybe the project as it is is good enough.
I start feeling almost physically sick when I see the potential for bloat to creep into the software I write. This makes working with scaled software development with others particularly hard, however.
This is so true especially when it comes to frontend development also for backend framework but to a lesser extend.
But i also think that the product, library or framework owner should really box in its project and reject wild growth of features and prevent generalisation of the usage.
"But even in that context it was clear that by making some tradeoffs"
It was literally a weekend POC, and Casey Muratori even went beyond the POC part and fixed some emoji/foreign language bugs that were present in the Terminal.
Also, his intent was not to replace the Terminal. His intent was to demonstrate that it was possible to do the optimization in the way he suggested. Originally a Microsoft PM dismissed his suggestions and claimed it would be a "doctoral thesis project" or something.
All this "yeah it's a narrow view" is just moving the goalposts more and more. Not only he has to do a "doctoral thesis project" in a few days, he also has to completely replace a tool that's already written, bells and all? Where does it stop?
But how much of that slowness is due to code that values "cleanliness" excessively? I bet that if you look at the source of nearly any application on your PC, it will be very much not clean on average.
I think it would certainly value the kind of "clean" design patterns (or anti-patterns as I consider most of them) that object-oriented programming evangelists espouse.
It wasn't about the tradeoffs that the Windows Terminal team "can not make" - it was about alternative optimizations and performance concerns that they arrogantly refused to consider as being possible, and which ought to have been considered if they were being properly competent.
Engineering tradeoffs are real. But hiding behind them every time when it can be pointed out that they don't actually apply - and when demonstrated with concrete evidence - is another thing altogether.
> The thing that sets him off is that he is using a computer with enormous computing power and everything is slow.
If that's his complaint, then "clean code" isn't the problem. The problem is capitalism and/or human nature.
Once something performs acceptably well, ie good enough to sell it, performance isn't going to get any better. Flashy stuff and features get you money, going from 400ms to 100ms gets you...nothing.
The industralization of the Soviet Union was done at the cost of the massive hunger, the murders of Great Purge and making disturbance to the neighbor contries (I am talking about pre 1939 now, WW2 soviet atrocities are not even scratched here), resulting in millions of deaths, and bringing misery to generations, in some areas until today.
> He does have a narrow view, but it does not make his claims invalid.
I would tend to disagree on this, specially when claims come from the gamedev world. Games are presented as finished pieces (even when they aren't), and not just a release milestone. Ideally, a game is a one-off effort where you write a piece of code and if you're lucky, you won't have to touch it again. So, doing one-off optimizations instead of focusing on milestones and long-term maintainability of the code is not only a possibility, but actively encouraged. That's why until rather recently (20 or so years), assembly optimization for critical execution paths, if not for most of the product.
Most of the rest of the software doesn't work like that. You often implement something that will be maintained, modified, extended and reiterated on for several years, not by you, but by several other teams with totally different experience and backgrounds. Or decades. Doing some fancy trick to skip a cleaner, extensible, maintainable design because you shaved off a couple of cycles on it is literally burning your employer's money and potentially causing huge issues in terms of maintainability, as many programs don't actually rely on a happy path like games do.
The main reason modern systems are slow isn't (just) because programmers are lazy - Its because most software - unlike games - have compatibility and maintainability requirements, and more often than not, a huge legacy support. And also, in these systems, most of development time is actually spent maintaining and extending existing code, not writing new one.
The author's assertion is fundamentally wrong, because software engineering is quite more than performance - even when it matters. Flashback to the beginning of the 90's, and "every game" used bresenham's algorithm to skip usage of the (slow or non-existent) div instruction. In some cases, a couple of bit wise shifts would also eliminate mul operations. These implementations were in some cases 2-4x faster than the classical counterparts, on a 12-40Mhz machine. Two cpu generations later, the Pentium comes out, and both mul and div take 1 clock cycle. The fancy pants implementation is now 3-5x slower at the same speed. Except now the cpu clock is 4x faster and shoveling around registers may actually impede parallel execution of code. All of this in a 5-year window. I envy the relatively stable instruction set of the last decade, where everything is sort-of predictable and assertions of speed can be made on code with a relatively high degree of confidence, but the reality is, silicon is cheap, and for most applications, performance is gained not by throwing away what makes some huge applications barely maintainable, but by deploying hardware. New, fancy, faster, cheaper and more economical hardware. Choosing a single metric (performance) and an instance in time to bitch about something is actually a disservice to the community at large.
> Two cpu generations later, the Pentium comes out, and both mul and div take 1 clock cycle.
Where are you getting this information? Agner[0] lists DIV as taking 17 cycles at best (8-bit operand already in a register) on the P5, and MUL as taking 11 cycles. Even Tiger Lake takes 6 cycles for DIV.
There are ways [1] to beat that, but I don't think you can get it down to a single cycle.
You are completely right, I just had a major brain fart. I probably mixed up some things (or had some bad/incomplete source at the time). More than two decades have passed, so its probably me with wires crossed.
And the amount of money changes the fact that the core engines are written as a one-off effort how, exactly? Updates to the scripting engine to fix play-ability issues and content updates aren't really heavy software refactoring. Sure, there are usually some actual code bugfixes on the initial releases - and more often than not - related to someone implement some really clever trick that raises an exception on some cpu. Its not like they are incrementally rewriting and extending the internal engine for a decade, as it happens eg. with a browser.
If you think that, you're not familiar enough with modern games as a service. Fortnite lives on the latest version of Unreal Engine, and Unreal Engine changed a lot from the initial Fortnite development until now with many new features, rewritten parts, and major refactoring of other parts. It's huge and constantly evolving, so it is similar to, i.e., browsers.
"Games are presented as finished pieces" is an idea that's at least a decade out of date. The industry has gone very hard on the idea of Games as a Service and it's now normal for AAA games to receive years of content updates.
From a software perspective, they are. Most games don't change requirements (or base code) during their maintenance releases, as these releases may fix some code bugs, more often than not provide only incremental updates on content. Compare that with eg. intermediate releases of software like OpenOffice.
Casey makes the point that you don't have to hand-tune assembly code, but instead just write the simpler code. It's easier to write, easier to read, and runs faster too!
If there's something wrong with that advice, I can't imagine what it is...
As a counterpoint: "Clean Code", at least the variant from the book, is very frequently also extremely difficult to write or to read in larger codebases too.
The claim that "Clean Code" scales better or allows for more maintainable software hasn't been proven by anyone, and everyone with enough experience has worked with several counter examples.
The problem of code maintainability is not solved by this coding philosophy.
Sometimes people forget that "Clean Code" is a book, and it's not exactly stellar. This is a pretty thorough tear-down: https://qntm.org/clean
I'm totally onboard with prioritizing readability over performance for most code, but the style in the book has a lot more tradeoffs than it discloses, and you often don't really appreciate that until you are trying to debug if/how A transitively calls Z in a 10 million line codebase.
It's really hard to have a constructive conversation about this though since it's so subjective, any example is too trivial, and any real system is too large.
Do you have any example of an open source project written in the simple style suggested by Casey Muratory that is very difficult to be maintained and you think would benefit from using "clean code", "SOLID", design patterns and abstraction on top of abstractions?
If you can't point to an actual example, I don't think you have a solid case.
This. Clean code isn't about writing stuff like this. For what he's talking about the overhead of polymorphism is a major part of the total cost and his case is simple enough that there's little value.
However, the bigger your task gets the more value there is to polymorphism and in general the smaller the percent of total time goes to the polymorphism overhead.
And note that his attack is only on polymorphism, not the other aspects of clean code. I strongly suspect the compiler optimizes away much of the clean stuff I do but I have never checked. I also find profiling easier on cleaner code, it makes it very obvious where the time sink must be and thus what warrants expending effort to improve. Profiling almost always shows the vast majority of time going into the unavoidable (say, disk reads) and a small number of other routines. Spend your optimization effort on the spots that need it because 99+% of your code doesn't run often enough for it to matter.
And when you combine inadequate abstractions with programmers who aren't the kind of geniuses brought in to optimize game engines you get very difficult to fix performance problems.
One of the nice things about some of the clean code concepts he uses is that (as he shows) you can tactically step back from them in key, performance critical areas and reap these wins.
If you stay too low level you get lots of tangled spaghetti code with major performance problems and no obvious way forward besides "make it better."
I find it a bit disingenuous to call what Casey Muratori is doing "staying in the low level".
Using procedures/functions is not exactly "low level". Using switch is not low level. Lookup tables are something you have to do in high level code all the time.
Sure he could have used much better variable naming (CTable?) and probably documentation, but code-wise there's nothing that screams low level there.
I'm not sure how it's disingenuous, I sincerely believe what I said and I'm not trying to fool anyone.
I would consider most of his replacements lower level than typical the clean code practices he critiques (especially the ones like iterators that he mentions but avoids in order to steel man the clean code side a little bit), not the lowest level possible. They take into account how the machine actually works and avoid additional indirection which is why they perform better.
Although it is typically easier to abstract in a "high level" language, abstraction does not require it. This whole debate is rooted on false assumptions and the need to take a side, imo.
Casey has a point, it is just ignored in a typical hand-wavery fashion. "The toy example doesn't scale" is a poor argument, especially when what we can observe is slow software.
The stuff proposed in the post is not rocket science, it is a very straightforward implementation of tagged unions. Instead of fetching a vtable and jumping to a value there, he proposes to branch on the tag. This is essentially dynamic dispatch on a known set of types.
Additionally, he shows that this can result in speedups greater than a factor of 1. Any program that wants low latency or high throughput can profit from this observation.
This way of programming is by no means the one to rule them all. It has different advantages and drawbacks; none of which have anything to do with the percieved intelligence of the programmer or later consumers, for that matter.
An objective disadvantage of this style is, that the program can't interface with code, that hasn't been written yet, as a caller. Another disadvantage is that the size of the tagged union is defined by its largest "subclass".
In the end, what he has shown is that speed is often a compromise made unnecessarily.
This doesn't really have to do with clean code anymore, as I can see how a compiler could implement what he is angry about with virtual functions in every situation where his style is applicable.
Casey has had a similar thing about the windows-terminal and somewhere in his videos a different, yet arguably worse, problem comes to mind: a lot of libraries do not care about performance enough. If you write a program and care, you may run into the problem that the library you use is your bottleneck. If this library is hard to replace (imagine needing a rocket-scientist), then you are done for. In that specific case it was DirectWrite and some other Windows-API that were slow. So if, for one reason or another, the windows team was required to use both, they'd have a hard limit on how fast they could go, just due to that. There is no "being smart" or "requiring a genious" involved in the forced/strongly recommended library here.
> If there's something wrong with that advice, I can't imagine what it is...
It will start getting really annoying when you try to add shape ‘hexagon’ and need to figure out all the places where a shape can potentially be used, just so you can update the switch statements.
Many languages provide unions or sum types along with exhaustiveness checking to make this very easy (frequently not OO-inheretence based languages though).
Why would you go with "many languages" into a thread showcasing how C++ sucks? I'm pretty sure that had the author of the video done all the same manipulations in Python, the speed difference would've been negligible.
The author of the video discovered that C++ compiler is dumb when it comes to optimizing virtual method calls (that instead of bare virtual method calls he had to help the compiler to guess the right conditions where these virtual calls could be replaced with guessed static calls). Essentially, all that his video is saying is: "virtual calls bad if-else good". Which is like what every C++ game-dev thinks after few years on the job. Which is amusing in how short-sighted it is, and sometimes even more amusing to discover the "solutions" created by such C++ game-devs that are aimed at replacing C++ objects, but do it in a way that's even worse than C++ original design (who would've thought that to be possible!?)
If this is going to be a library where that’s a desirable feature, architect it for that feature. In the example, one easy way would have the coefficient table be expandable/replaceable. If you really need to run arbitrary user code, then write an interface that the user will conform to and call their code. You don’t even need OOP support to do that easily, just typed function pointers.
Yeah, it's very awkward. Your best option is to leave a 'hole' case where someone may provide a 'data type' set of functions satisfying an interface, and the library author simply calls them. Effectively you're adding an OO escape clause, but it's ugly and will break user code when you add more functions and grow the interface.
Conversely, in a codebase organized by objects it's not clean to add an extra method to the base class and each subclass. You have to write an external function and switch over every known subclass inside it, which is also very ugly and will also break when you add more subclasses.
The two designs are actually the duals of each other. Someone compared it to rows vs columns and it's a great comparison.
In OO, the methods are columns and each new row is a new subclass implementing them.
In FP, the types are the columns and each new row is a function that switches over the possible types.
Depends on the type of code you’re writing. If your `switch` is tightly coupled to the code that defines the cases and they’ll definitely be changed in lockstep, a default is more likely to be harmful.
If your cases are defined externally, and you need to be forwards compatible, omitting a default is wrong.
The Swift language specifically added `@unknown default` for switching over enums.
Depends on your language I suppose. I haven’t worked with a ton of compiled languages.
But we can just re-up the problem by adding 100 different shapes instead of the one. Now you have switch statements with 104 cases each spread through your codebase.
I prefer to have those 104 cases all in one place (as is the case, when it is a switch statement) rather than each in a separate file, that I need to jump around between now (as is the case with polymorphism). This situation is a bit analogous to organising things column-wise vs row-wise. And in practice I find that I need to jump around a lot less with code that uses switches than with code that uses polymorphism. Tangentially, the latter is also more prone to turning into spaghetti, as the whole is obscured by indirection levels between the parts, but you don't see the spaghetti until you try to step through the code, when debugging an issue or just trying to familiarise yourself with a new codebase.
Conversely it's really annoying to add a new method to each shape--you've got to open them all up and add shape-specific code for them with a bunch of boilerplate. With switch statements you just add one more function.
switch isn't 'invented' for enums. switch is a low-level construct which mimics several goto's / jumps. It's just as bad, except for the case where you have either: a) multiple behaviors for the same value, b) need to pass through (not break). b) is the nr 1 reason I hate switches, and my nr 2 reason is that most languages don't support proper enums, and will fail when you don't handle all possible values
Agree. It's the same as saying don't use for loops or any other basic language constructs. Switch is very useful, please leave switch alone! You will not take switch away from me :)
When do you actually pass through? Only some state machines do that. But even I. That case, ifs are more clear and less error prone due to missing breaks, braces/scoping issues, etc
> Is Casey assuming the entire commercial software industry === Uncle Bob?
It's uncharitable to take Casey as making absolute blanket statements like that, but still, it would not be unreasonable for him to single out Uncle Bob in particular.
The Amazon rankings for Bob Martin's "Clean Code":
Best Sellers Rank: #5,338 in Books (See Top 100 in Books)
This comment helped make sense of this whole comment section for me.
I work in game development, largely with optimisation. I mostly work with GPU optimisation, which is a whole different beast. On the CPU, most of the time issues are either trying to do too much stuff in a hot loop (rendering stuff that could have been culled, putting physics on objects that don't need it,...) or doing something in a slightly inefficient way in a hot loop. Because everything in the game is indeed a loop consisting of a series of hot loops.
People in this comment section call his example contrived, but it's very similar to one of the biggest performance improvements I've seen in practice.
Hot loops are where you spend your optimizing efforts. If you're going through that list of shapes again and again it very well might be worthwhile to cache some data and provide the objects with a way to update the cache.
There are a lot of clever techniques already in play to minimise the amount of data you need to consider.
Still, each triangle's position, shape, and other properties can change each frame, as does that of the camera. So you cannot avoid doing some amount of work for each of the visible triangles and their vertices each frame.
Since you need to update the screen at a consistent frequency (typically 30 or 60 times a second) and the list of triangles that actually need to be rendered each frame is in the millions... Well, that's a lot of work which cannot be avoided.
Bob Martin has made a real effort to tie himself to the specific phrase "clean code." If the author of this article had referred to clean code without using quotation marks, or talked about managing software complexity using any other term, you'd be right, but I think he's specifically talking about Bob Martin's "Clean Code" and the school of object-oriented philosophy that cleaves close to his beliefs.
The fact that you think that "everything is inside a tight loop" doesn't apply to all code today already shows your own model of code is broken because you believe the syntactic sugar modern programming languages and paradigms provide is actually reality. If everything wasn't in a loop, your program would halt once you're done with whatever you're calculating. Just because you have things like callbacks and things feel lazy doesn't mean that things do not really operate in a loop on a deep level, of course they do. You just are insulated from it because you write hooks only and such and you don't actually see the loop.
Believe it or not, callbacks are not like interrupts, there is a loop somewhere that checks the status of something and then runs the callback. All computer software today involve things that run in loops, you just don't see it. Web browsers do it! Of course they do.
Moreover, he didn't contrive his example, he said that he in fact used a textbook example used by the advocates for polymorphism and such.
EDIT:
To expand on this, why is modern software slow? The reason is because people, thinking their code isn't a bottle neck or performance doesn't matter but adherence to the right abstractions is, they write slow code and call backs thinking everything just happens immediately, with layers of abstractions, and those little innocent steps add up when every piece of code written today is written with that same neglect. The call backs runs slowly, queues fill, promises hang, and so-on and on we go.
So sure, may be your code doesn't seem like it needs to run at 60fps. But when everything I do on a computer is written like it will be run every 10 seconds, it definitely will be noticeable. That's because I don't just run your program or look at your website, I look at 10 of them or even more. If I average all of those per time, then may be your code should in fact be able to run at 1Hz or so or I will start to notice.
People of course were right not to teach new devs not to over-optimize immediately, but the culture has swung so far in the other direction, especially since you all seem to love complexity so much, you've managed to yes make computers that can calculate pi faster than a super computer from the 70s crawl when it renders and handles an editable textbox. There has to be a move back in the other direction, you guys need to give a shit about performance, at least a little.
New developers are still watching Uncle Bob videos online and taking those strategies as the default for how you craft code, largely because there are not very many people since then making similarly grandiose claims about how software should be crafted and forming entire companies pushing adoption of those techniques commercially. We even had a young dev leave our company and form a startup around the idea of doing what we do, but a full "clean code" rewrite. Our software already has major performance issues, I'm not hopeful about the speed of his code after he layers on even more abstractions.
Exactly. The problems of software performance come from decades of poorly/quickly executed evolutionary change resulting in bad systems design. It's all an new abstraction over an older abstraction over an even older abstraction, because some old application still needs to be supported (something Casey has likely never had the problem of worrying about in game development).
Game developers have the luxury of starting from near-scratch every once in a while. That exactly what his lauded handmade series is all about. I'm guessing that things wouldn't be so clear-cut if he was given a 10 year old codebase to iterate on.
"Normal" developers see game developers as gods walking amongst us and place far more value on their opinions than they should. The truth is that game developers and "normal" developers face equally as challenging problems, just different problems. As a trivial example, an experienced web developer could probably run circles around Casey in terms of elegantly accounting for browser quirks (conversely, the web developer would probably be stumped about data oriented design). Either could learn the other's discipline, but each would have decades head-start on the other.
The idolization of gamedevs is extremely frustrating, especially when it comes to appeals to their authority.
Even more annoying than the idolization of game devs is when you open up the "Displays" submenu of the osx system preferences application, and it takes several times longer to load than the previous major os version, several seconds!, with the only significant change being a different layout, and constantly trying to ignore how nearly everything takes so much longer than necessary, wasting so much time and energy.
I agree that not everything is like a game, but it makes me legitimately sad when it seems like nobody cares about performance (aside from a few domains).
I am a developer who worked on embedded, desktop apps, mobile apps, games and now on large microservice based application. A developer is a developer and can move from one kind of application to another.
I find what Casey says in his videos to be true. And I though about that stuff before I even watched his videos, which are excellent.
However, I started not to care. I don't want to start fights inside the company, especially fighting alone against many OOP cultists. There's not my money at stake, so if companies as a whole decide for OOP, clean code, SOLID, design patterns, abstractions on top of abstractions, making the code bases giant pile of junks while degrading performance, I am not going to go against the crowd.
Code that I write for myself is quite different than code I write for my employers.
I just hope that the industry as a whole will wake up from the whole OOP nightmare.
> I just hope that the industry as a whole will wake up from the whole OOP nightmare.
I agree with that 100%. OOP is a giant mess, no matter where your stance on code clarity or performance stands. It's objectively worse in both regards.
> something Casey has likely never had the problem of worrying about in game development.
This is simply not true, and has in all likelihood worked on such problems given his work at RAD whose software has been used in +20 years at this point.
> The problems of software performance come from decades of poorly/quickly executed evolutionary change resulting in bad systems design.
This may be true of some code bases, but it's demonstrably false for new software that's created today. Lots of new software gets built and it's slow.
Fair enough, but never forget that you can be your own hero. I have heard numerous accounts of hobby coding being used as a successful antidote to chore coding.
If you are shipping a binary to your users that will never be able to get updates, your cautiousness would be justified. There are other situations where it will needlessly limits options.
There are situations where I have knowingly written O(n^2) or worse, and put a # xxx dragons marker by it. Quick to write, leave my options open, keep my momentum on the problem I care about.
I will grep for xxx issues at some later time. I may end up throwing out the code before that happens. If I hit big-o issues before then, I can refactor.
I once had a system where the important problems turned out to be a series of IO bottle-necks - nothing to do with computation - but that was obscured because good sense had been burnt at the altar of compute efficiency.
It is the exact same situation, though. Most people can just chain a sort and a binary search to do it, and both are included in most languages.
Or just put it into a tree map, if your language has it.
The point is that the developers may think O(n^2) is fine because their toy use cases had n=10...100, but then actual users will try to use the software for n=10k, or n=100k, and then either waste their lives working with suddenly slow software, or look for alternatives.
I walked into a case like this the other day. I wanted to do a little semi-collaborative project planning. I found a nice tool, played with it for a moment, figured it has the functionality I need and it's fast enough. Then decided to do the actual plan. Once the number of entries in the system went from 10-20 to 30-40, I started to feel things get a little laggy. 50-60, more laggy. At this point I was committed, so I suffered the tool for couple of months, as its UI kept breaking when handling 100 entries. If I knew this would happen at the start, I'd look for something else. But instead, I walked into a hidden O(n^2) somewhere, that makes me hate the product with a passion now.
It's more than that. The way black box composition is done in modern software, your n=100 code (say, a component) gets reused into a another thing somewhere above, and now you're being iterated through m=100 times. Oops, now n=10k
Generally, Casey seems to preach holistic thinking, finding the right mental model and just write the most straightforward code (which is harder than it looks; people get distracted in the gigantic state space of solutions all the time). However this requires 1. a small team of 2. good engineers. Folks argue that this isn't always feasible, which is true, but the point of these presentations is to spread the coding patterns & knowledge to train the next gen of engineers to be more aware of these issues and work toward said smaller team & better engineers direction, knowing that we might never reach it. Most modern patterns (and org structures) don't incentivize these 2 qualities.
> The way black box composition is done in modern software, your n=100 code (say, a component) gets reused into a another thing somewhere above, and now you're being iterated through m=100 times. Oops, now n=10k
That doesn't seem quite right. as 100 * (100^2) <<<<< 10000^2
Yeah I was only talking about quantities. Equivalently, assume that it's a linear algorithm in the child and a linear one in the parent. Ultimately it ends up as O(nm) being some big number, but when people do runtime analysis in the real world, they don't tend to consider the composition of these blackboxes since there'd be too many combinations. (Composition of two polynomial runtimes would be even worse, yeah.)
Basically, performance doesn't compose well under current paradigms, and you can see Casey's methods as starting from the assumption of wanting to preserve performance (the cycles count is just an example, although it might not appeal to some crowds), and working backward toward a paradigm.
There was a good quote that programming should be more like physics than math.
There was a fun example in the Julia compiler around a year ago.
Part of the compiler was O(N^2) in `let` block nesting depth. That is
let x = foo(), y = y, z = 2y
...
end
would be a depth of 3. It didn't seem like that should be a problem, N is never going to be 10, let alone 100, right?
Until suddenly, `N` was in the thousands in some critical generated code spit out by some modeling software, so that handling the scoping introduced by `let` suddenly dominated the compilation time...
The "contrived textbook example from 20 years ago" still has a very real impact today. In my experience there are still lots of development teams that are instructed to develop in a "Clean code" style in the flavour of Uncle Bob. It's especially true in the .net development space, and is almost a cultural problem within .net.
As a former .net developer that was often pushed into "clean code", my big takeaway from the video was that actually, not using "clean code" techniques, such as polymorphism made the code so much more readable and easier to grok that the optimisation that followed was completely natural.
There are a lot of people who advocate "clean code" principles without ever having read or knowing about Uncle Bob, because those "enterprise java dev like 10 years ago" folks sort of seeped into the industry.
It's the same thing with TDD zealots. Or any other fad driven development paradigm, which our industry is filled with.
This summarizes my impression of the article. It reads like a freshman CS TA giving a "well, akshually" speech. This is all old news- everyone knows vtables are "slow" in some very loose sense of the term. In general, these optimizations don't make a big enough difference to be worth considering while designing your code. In very particular domains, these ideas are valid, but a lot of that code is written in C so there's no dynamic dispatch anyway.
All of these are instances of doing something _wasteful_, which is the #1 issue he mentions in the list of things that cause performance degradation.
Now, your argument seems to be: in the real world, there's so much waste, that virtual function calls pale in comparison.
This does not debunk his main point, which seems to me at least the following: all things being equal, writing code with virtual functions that do a tiny amount of work and "hiding implementation details" makes performance worse, sometimes by an order of magnitude.
Now, there maybe situations where you _have_ to use virtual functions, because you are writing a library for other people to use, and you can't dictate ahead of time how they will use it.
This again does not invalidate the point. You need to be _aware_ of the performance implications of this, and mitigate it. He said the following in the comment section on the article:
> Try to make it so that you do very rare virtual function calls, behind which you do a _large_ amount of work, rather than the "clean" code way of using lots of little function calls.
>all things being equal, writing code with virtual functions that do a tiny amount of work and "hiding implementation details" makes performance worse, sometimes by an order of magnitude
but all things are not equal. You can spend a lot of time improving performance of you function calls and get virtually nothing out of it. Because if you optimize something that takes 0.01% of overall execution time, 'order of magnitude' performance gain is still negligible.
Also articles like this usually fail to mention code maintenance cost. For example by reducing usage of virtual calls you can make your code unmaintainable/expandable and suddenly every new change will cost you 2x more in development time.
That's why in the real world most of the time you choose clean code and you use optimized nonclean code only on places where you need it. If you look at any lets say web framework internals, you will find a lot of non-clean code, which makes framework faster. But an interface will be done in clean fashion and most of user of the framework will enjoy clean code without need to care about unclean internals.
This article is not aimed at people who are working in a codebase where everything is super terrible. It's written for performance aware programming. One of the aspects of performance awareness is awareness of how virtual functions and tiny functions affect performance negatively.
> Also articles like this usually fail to mention code maintenance cost. For example by reducing usage of virtual calls you can make your code unmaintainable/expandable and suddenly every new change will cost you 2x more in development time.
This never actually happens in real life. I've never seen a codebase that is written with "clean code" principles in mind that is also maintainable and easy to develop on top of.
Write slow code now, profile and optimize later is how we got all slow software because second step - optimization practically never happens in my experience.
Along with the heuristic that hardware and electricity are cheap, developers are expensive. That's probably why managers in my experience almost never ask developers to optimize slow code - they believe it would be cheaper to use more hardware if this fixes the problem (in some areas like HFT or GamDev more hardware is not the answer so optimization do happen). In rare cases I've seen optimization being done initiative always come from IC (who knew that the code could work fast / use less resources).
So nowadays writing code I assume that it never will be optimized later and try to do less dump stuff from the beginning.
The heuristic is one of those for the mid-experience developer. That's the point where you spend two weeks re-implementing Set<T> to save 1 cycle. Or over abstract. Lots of developers never get past that point.
Experienced devs - such as yourself - generally know how to write reasonably fast code that is also clean (or easy to extend) first time. In my opinion we should be more explicit about this in the heuristics.
>Write slow code now, profile and optimize later is how we got all slow software because second step
We got slow software because we were ok to get slow software. If being fast is not in requirements it means it doesn't matter (for whoever is responsible for defining priorities). Places where performance matters it is never sacrificed.
Doesn't matter for what/who? It's doesn't matter from the business angle since there isn't any competition that works better. But it definitely matters for me.
I am not okay with slow software, and being forced to use it is frankly insulting. Teams (to pick a punching bag) hogging my system resources doesn't impact its adoption since I'm forced to use it. Teams could be 10x slower and I'd probably still be forced to use it. But it wouldn't be because speed doesn't matter.
This is a fully general counterargument for why anything that is bad is actually good because it doesn't matter. "Places where air quality matters it is never sacrificed".
It doesn't matter in a sense that most companies continue to be profitable even paying significantly more for hardware / clouds than they potentially can. But it is sad to see nevertheless. Also it increases CO² emissions.
If you program using design patterns that are 10x slower, your application end up 10x slower, even after you've optimised the hot spots away, and the profiler will not give you any idea that it could be still 10x faster.
If your profiler doesn't show any hotspots (which is incredibly rare in practice) and your program is still slow, it means you can simply pick any of whatever functions/methods show up near the top to optimize.
If your program isn't slow then you don't need to bother making it any faster. Even Michael Abrash, who specializes in code optimization, once explicitly wrote in his graphics programming black book that "The objective (not always attained) in creating high-performance software is to make the software able to carry out its appointed tasks so rapidly that it responds instantaneously, as far as the user is concerned. In other words, high-performance code should ideally run so fast that any further improvement in the code would be pointless [..] Notice that the above definition most emphatically does not say anything about making the software as fast as possible".[0]
Think of using slow patterns as using a slow programming language. After you've optimised your python program and eliminated all hotspots, the python profiler isn't going to say 'hey, this could be still 10x faster if rewritten in go'. Note that if you write a graphics engine, you aren't going to use python or even go, but something more like c, c++, rust, even though your code would be cleaner in python. Clean code techniques elevate your level of abstraction and prevent some problems, but at the (significant) cost of performance. It's of course always a matter of trade-offs and this matters more or less depending on which problem you are trying to solve.
> After you've optimised your python program and eliminated all hotspots, the python profiler isn't going to say 'hey, this could be still 10x faster if rewritten in go'.
No it wont say something like that but if you profile Python itself and see that a lot of time is spent in Python you can get a hint that rewriting it in another language that doesn't have Python's overhead might help.
If you're able to make a program into a sort of puzzle concealing its state inside a twisty maze of tiny virtual functions so it's impossible to see plainly how anything is done, then you become indispensable as the only one who's internalized how the thing works.
And then you insist it's all for easier comprehension… for those sufficiently intelligent to comprehend it.
If the average programmer didn’t use ‘clean code’ techniques they would end up with crazy spaghetti, not the fast and clear code shown in the video. (the average programmer is not on HN and can’t do fizzbuzz)
Casey does not advocate for code optimization. Not writing code in a way that is known to degrade performance does not mean you are doing optimization.
> If your profiler doesn't show any hotspots (which is incredibly rare in practice) and your program is still slow, it means you can simply pick any of whatever functions/methods show up near the top to optimize.
> If your program isn't slow then you don't need to bother making it any faster.
But you can only judge that for the very limited set of hardware you personally have access to. You might have many users -- or potential users -- with slower CPUs.
This is the same as trying to guess the future that creates "astronautical architecture". You set a target hardware you're willing to support (e.g. PCs released in the last 10-15 years) and anything less is out of scope of the project. You don't need to support 8086 PCs for example.
Well, for starters, Amdahl's law exists. If you use design patterns that are 10x slower on a code path that takes only 0.5% of the execution time your application ends up 0.55% slower. And the profiler never tells you how faster can a code path be. It just tells you where are you spending most of your time so you can put the effort where it matters.
This is precisely the kind of situation where it's imperative to consider performance before starting to build a new system.
When you hit Amdahl's law, it's because you (or someone else) has made decisions about the high level design/patterns to use in a system. To remove such bottlenecks, you may have to scrap the entire project and start over.
For the inner loops, it's perfectly fine to leave most of the optimization for later. But for overall design, it needs to be right from the start.
Time and time again, I see devs stuck in some paradigm (often front-end devs or low volumen RESTful microservices) that makes it almost impossible to handle non-trival data volumes or traffic, causing new products to fail.
I mean, you always hit Amdahl's law, and the point is that most often the time limits are not that related to the architecture. Let's say you do these "10x slower patterns" in a backend application, in the part where the DB model gets translated to a response to give the client... Yeah, maybe that pattern is slower but most of the time of the response is spent on the network and the DB response.
I do agree that overall design needs to fit performance requirements but for the most part that has nothing to do with "clean code" patterns.
It's rare that its the algorithm and not the design that is slowing you down. Though, sometimes it is the algorithm slowing you down, just usually it's the design.
As an example, I recently worked on a large system that was optimized to do a big data transformation in an efficient way. It turns out that data is transformed back to the original format later downstream. So much for the optimization...
All that is to say, often it is the case that "simple > fast"
Clean code at the time had a lot going forward it, and it was an improvement over a lot of JavaEE code that was written in absolutely procedural ways with less care for the developer reading the classes, functions, or individual statements compared to punching out near assembly-like code and moving on.
All of the advice in that article isn't going to bring your server latency for an API call down from 1000ms to 30ms, but rather from 30ms to 25ms. So sure, if you absolutely must optimize that 30ms call after you have fixed everything else then go ahead, but very few are at that stage or will ever get to that stage. And if you try to optimize that last 5ms at the expense of the much larger issues then you are actually making things worse.
> All of the advice in that article isn't going to bring your server latency for an API call down from 1000ms to 30ms, but rather from 30ms to 25ms.
Of course it will.
If your backend service is already suboptimal, and running at 10x worse performance, optimizing that will give you, well, a 10x performance boost.
Imagine replacing poor in-memory reimplementation of database queries that most graphql servers do with actual opttimised database queries. And a better code on top.
Boom. You're operating close to the speed of light.
>Imagine replacing poor in-memory reimplementation of database queries that most graphql servers do with actual opttimised database queries. And a better code on top.
but you are actually talking about optimizing system design, and not reducing virtual functions calls :).
And that the point of this thread: you need to optimize parts that slow you the most.
So no, in most cases optimizing virtual calls won't bring you from 1s to 30ms
no
take your tupical web service application. Even if you use design patterns that are 10x slower, your program will still be as fast as your DB and overall system architecture. And the choices on your DB schema, indexes and caches will have 100x more effect on your 99pp response time than design pattern you use.
That's only true if the DB is equally as problematic, and you programming language/framework itself aren't compounding on the overhead caused by the "10x slower" architecture.
On slower languages with slower runtimes, something that is 10x slower than normal code will have much more overhead than in the examples demonstrated by Casey. It won't be about "30ms vs 25ms" as some people are saying. In the past I remember seeing differences between 400ms and 20ms between JBuilder and .to_json in a critical endpoint in a Rails app, to give one example. Sure, one is "cleaner", but in the end it's a 20x overhead that has no place this case.
Also, the myth that "processors spend time waiting for IO" that is spread across this thread is BS. In reality, that's only true for single-user programs. If the app is part of a distributed system, the CPU time can be used to serve more users. This allows you to significantly delay more complex scalability efforts, which is also precious developer (or DevOps) time.
Not to mention that applying "Clean Code" in the first place also takes precious time, which could be used for features or anything making money, even optimizing the DB. Instead, this time is used to mess up the code in ways that have zero proven efficacy, and some developers instead think are terrible.
I would argue that persistance and caching strategies are also design patterns. Of course, if you're not the tech lead/architect it may be out of your hands.
If you're working with databases, performance often boils down to minimizing the number of times you have to access the database (over a multi-service stack, not just a single web service).
If you can remove thinly spread overhead from your web framework or any wasteful ritual dance you make for each request, it might not be much, but it does add up over the course of 24 hours * number of workers to quite a bit.
Not necessarily in the general case. If the program is a single user, locally ran program, then sure. If it's some sort of backend or distributed service, this is just wasted performance that can be used to serve more users. Virtually every distributed service built today is able to take advantage of this.
With a non-pessimal design, not only you are able to pay less money on servers in the long term, you're also able to delay complex scaling strategies. Scaling also costs money.
Not to mention that building something with this kind of overhead also means that a lot developer time was spent in the first place, which is still expensive in our industry.
ORMs and slow DB queries kind of go hand in hand. Also, you'd be surprised at how efficient arrays are for membership checks so long as the number of items is even moderately small (as a rough rule of thumb, the only thing that really matters with these kinds of checks is how many cache lines are involved).
Well said. The aphorism "Premature optimization is the root of all evil" is meant to mean "Build it right first, then optimize only what needs to be optimized". There's really no need to start cooking spaghetti right off the bat. Clean code with some performance tweaks will be more maintainable in the long run without sacrificing performance.
Casey's implied point is that clean code is already sacrificing performance from the start. And of course real life tells us that those "performance tweaks" will never happen.
There is this popular wisdom that security must be designed for from the start, and cannot be just added after the fact. Performance is like that too, except worse, because you actually can add security after the fact - worst-case, you treat the entire system as untrustworthy and wrap a layer of security around it. You can't do that with performance - there is no way to sandbox your app so it goes faster. You can only profile things then rip out the slow parts and replace them with fast ones - how easy that is depends on the architecture and approach you adopt early in the project.
> And of course real life tells us that those "performance tweaks" will never happen.
I seldom wish I could post images on here, but I really would love to share a photo of the surprise coffee mug my work sent me, with a screen cap of a completely obliterated Y axis on a performance monitoring graph. Granted I don’t get to spend all my time hunting optimizations, but some teams/orgs/companies do very much value performance very explicitly.
Edit: and I’m definitely not a game dev. Though I’ve been itching to borrow some game dev techniques that are quite applicable for my domain (particularly ECS, entity component systems, which I suspect have far broader applicability than their adoption outside of game dev).
This is not my experience at all, at my org we always write it the simple way first, and if it needs more performance after the fact then we always add the performance. User friendliness always comes first. This includes not waiting 20 seconds for a db query that could be rewritten to happen in 1 second, but it also means not waiting a month for a bugfix which could happen in a day with maintainable code.
That's a nice approach, and I envy you the environment you work in.
In my experience so far, it's typically the case that the 20 second DB query will annoy people for months or years - it won't get solved until enough people raise enough of a stink that someone finally prioritizes it. A large customer suddenly starting to make vague hints about bad performance is sometimes (but not always) helpful.
Some may say that "annoying, but not enough to make a stink about it" means it's fine to not optimize it. But I found that people can suffer a lot, and it doesn't mean it's harmless. People will adjust their workflows to minimize the frustration. When some of your "victims" are in-house users, the "not important enough" performance issue may be silently but continuously losing company money.
> In my experience so far, it's typically the case that the 20 second DB query will annoy people for months or years - it won't get solved until enough people raise enough of a stink that someone finally prioritizes it.
Among the many things I’ve learned as a self taught dev: you can be the person who raises enough of a stink if you care a lot. It’s not a thing you want to invoke frequently, but it’s a thing you very probably have power to invoke where it matters most. If you can make a good business case (or any case for user success that impairs your org), you have very good odds of being able to pursue it in any but the most toxic situations. If you can link $thing-you-want-to-pursue to other probably shinier biz/org goals, you’re 99% of the way there.
> there is no way to sandbox your app so it goes faster.
In a number of cases you actually can. Casey demonstrated it in his Refterm lectures, it's caching. You still call the slow thing, but at least you don't call it as often because you have that layer of caching to partially insulate you from its poor performance. Good luck if you have to deal with cache invalidation, though.
I'll concede on saying that performance and security are alike - you can add some of either after the fact, but you're better off thinking about both from the start.
> Good luck if you have to deal with cache invalidation, though.
Ain't it the truth. Adding a cache is easy. Understanding the implications of doing it is harder.
This isn't true in my experience. Even in the context of games development. It pays to be simple at the outset because you often need to iterate code to get it right and writing and iterating optimised code is harder and slower than just doing something simple first. And YES, we ALWAYS went back and optimised the slow bits.
But "being simple from the outset" is exactly what Casey is advocating here. Start with the simple code, that he ends up with, rather than optimizing for a "Cleanliness" metric.
His final code is definitely simpler than the alternative, which would probably involve several files in another environment.
Sure, he does reach for a benchmark, but that's merely to demonstrate the end result.
The argument continues to say that the 'clean code' version isn't actually cleaner/more legible/more understandable.
I, for one, am royally sick of spending 5 minutes trying to figure out which virtual function implementation was actually called!
Adding performance to software after the fact is typically far easier than securing it. Layers of security have to be watertight, and generally they are not if you don't design them from the start to be. A good performance engineer works in the constraints of their project to balance minimal code changes with performance wins.
To be fair to OP, does anything about your current paradigm even allow you to evaluate his claims? Is your code even in a form that you can for example remove polymorphism and use simple arrays of data you want to work over?
You can of course only optimize what you are looking to optimize. I am not surprised (honestly) that some engineers do not realize they are in fact primed for the kind of things they will find based on what they are looking by the mere choice of where they want to look.
Maybe for low user count is valid. We run a large multitenant, microservice based application and the physical machines where the Kubernetes pods reside have their CPUs at 90%. The application makes such a large use of "clean coding", "design patterns", SOLID that would make Uncle Bob proud. We would have been better without using so much abstractions on top of abstractions.
This is my experience. As I have said elsewhere, perhaps I am unlucky and work with "bad" developers, but everything being built for the future at $COMPANY is over complicated and has a multitude of abstractions.
Honestly, for most software, a well performant "monolith" is probably enough.
We had one service that had a single-thread bottleneck that none of the developers could configure out - the solution was to spin up 30x more virtual machines to run instances of this app to meet average production demand.
In a business context, it usually happens that hardware is cheaper than software (licensing) which is cheaper than engineering labor. Tack on the opportunity cost of delaying business advances/features and it's usually cheaper to just throw hardware at it.
There's a tipping point where you have so much hardware there's big savings with optimization. Things like Postgres and the Linux kernel have a lot of optimization put into them and there's an insane amount of hardware out running that code.
I'd actually say this article is generally unhelpful - it's good to be aware but as someone who works on sorting out performance critical things I want the code to be as clean as humanly possible going in. Whether you write clean or dirty code if you're a junior developer you're probably not going to write performant code and even senior devs may be able to sniff what might be a bottleneck in advance but most of us have learned to avoid premature optimization like the plague.
Maintainability and cleanliness is the best virtue code can have. If you have extremely clearly written code that has performance issues I can swoop in with analysis tools figure out where the pain point is and refactor it out. Sometimes this is a real headache[1] sometimes not - what I can guarantee is that if the code is "dirty" it's going to be a headache and it'll take more time.
I'd personally take issue with this article over the polymorphism claim though - polymorphism is a tool but it isn't the be-all and end-all tool. A lot of your data can live as structs/blobs in memory with tight internal type definition but without any OO principals. Personally I am a huge fan of functional programming (but not pure functional programming) so objects that I use are relatively few and far between and exist to fulfill a very specific purpose.
I've had two occasions in working when I needed to break out an asm block - the compiler was being a thick headed dummy and this code needed to receive incoming signals without exception or delay - but once that critical section was passed? Back to high level programming and statements favoring expressiveness over raw bare metal performance.
If you want an interesting experience talk to your closest non-technical manager type - be that a product team manager or the company owner - and ask them if they'd prefer if you focus on reducing how long your product takes to execute by 20% over the next five years or if they'd prefer you to lower the growth of the developer labor budget by 20% for the next five years by focusing on maintainability over performance. With the exception of extremely niche cases maintainability is always the golden standard.
1. For instance, I've dealt with OOM issues that have required transforming all logic on a query result to be lazily evaluated on a data stream after main execution finishes - like the logic goes up and down the stack and only then begins processing results. In this particular case the problem was rather easy to deal with because we essentially swapped out the actual value passing on each layer for a lazy result set being passed around - because the code was clean. Sometimes you'll definitely need to massively re-engineer things though.
I feel like there's a misunderstanding here. Casey is clearly not against writing non-capitalized clean code at all. His code in the end is "cleaner" than what he criticizes IMO. What he is criticizing here is capitalized (and possibly trademarked) "Clean Code", the book and philosophy spearheaded by Uncle Bob.
I agree that correctness is pretty essential (as in - actually does what it says, though something that's mostly correct is almost always the bar... most software doesn't need to be entirely correct). But I am confused about "quickly" do you mean dev time or execution time?
Depends. For example, if the slowness comes from sequentially emitting a lot of http requests, a lot of performance can be gotten from doing it concurrently.
What I've seen is slow queries but a bigger problem is actually too many queries. It's easy to do especially when using an ORM.
It mostly happens on change when you want to add something to an existing query the changer just add their new query and slop it into a loop, boom performance is gone.
One I found with profiling--when writing the code n was quite small. Many database operations simply iterated over an array to decide where to store an item. The time spent dealing with the data was a tiny fraction of the database round trip time, there simply was no reason to get fancy.
Over the years they've grown and one bin showed up where jobs would sit around for weeks--for that case n went from rarely having a screenful of lines to a few thousand--oops, now more than 2/3 of the time was spent in those searches. (And I have a sneaking suspicion that a good portion of the remaining time comes from using field names to retrieve values. The profiler doesn't separate that out, though, because it's not my code.)
Author could use a little more memoization in his example, but I suspect that breaks some of the simplicity of his argument.
If shape Area is computed often enough that you care about inlining the calculation, why not compute & store it every time the height / width change. That’d be easy enough in an architecture based on information hiding, and might illustrate a legitimate engineering trade-off between those architectural choices.
Right? His Area() function does the calculation from the scratch every call. Either a) make the Shape immutable and calculate area once, at create time, or have the mutator functions recompute the area when they are called. At that point Area() just returns an f32, and the compiler can do all kinds of optimizations.
Indeed. That's the point. Why let someone with an axe to grind define the problem in a way they can solve with their axe? The author is using optimization as the frame for rejecting a particular way of working, I'm pointing out that the definition is set up to make the solution work. I don't concede the terms of the debate before it even begins.
Speaking of mutators, his example depends the code executing in a single thread. What happens if calls to mutate the shape and calls to calculate the area are interleaved?
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
That's really not even close to true. Loading random websites frequently costs multiple seconds worth of local processing time, and indeed, that's often because of the exact kind of overabstraction that this article criticizes (e.g. people use React and then design React component hierarchies that seem "conceptually clean" instead of ones that perform a rendering strategy that makes sense.)
> That's really not even close to true. Loading random websites frequently costs multiple seconds worth of (...)
You attempted to present an argument that's a textbook example of an hyperbolic fallacy.
There are worlds of difference between "this code does not sit in a hot path" and "let's spend multiple seconds of local processing time".
This blend of specious reasoning is the reason why the first rule of software optimization is "don't". Proponents of mindlessly going about the 1% edge cases fail to understand that the whole world is comprised of the 99% of cases where shaving off that millisecond buys you absolutely nothing, with a tradeoff of producing unmaintainable code.
The truth of the matter is that in 99% of the cases there is absolutely no good reason to run after these relatively large performance improvements if in the end the user notices absolutely nothing. Moreso if you're writing async code that stays far away from any sort of hot path.
The subfields of programming I know the most about are game development, networking, and web development, and in all of those, it's not the case that only 1% of the code is in the "edge case" where performance matters at all.
For example, in the case of web development, if you build a medium-sized website with React (i.e. pretty normal behavior nowadays), then if you make default decisions that don't consider performance at all, your website will end up noticeably slow, because you will:
1. Write code that re-renders components all the time during loading and UI interactions,
2. Which depend on tons of third-party dependencies that perform
poorly,
3. So you end up spending a ton of time in re-renders while the site loads and while someone is using it.
Dealing with this isn't literally the same performance work that Casey put in his article, because it's at a slightly higher level of abstraction, but it requires the same mindset. It requires writing most of your code (and taking on dependencies) with performance in mind, not just 1% of it. You can't avoid it without your notion of "clean code" including some amount of mechanical sympathy, rather than just being about abstract extensibility and generalization concerns.
It's also very easy to mess up the performance due to architecture when using modern frameworks. For example: as much as SPAs are reviled, if you have a complex enough web application with heavy components, having it being an SPA might be better in terms of perceived speed than having it do a full reload on every navigation.
This is a mistake that I see far too many government and utility sites making.
> So why do 99% of real world applications run like garbage?
If you're really interested in the impact of performance issues on everyday life, you need to provide concrete examples instead of putting up unverifiable strawmen.
The truth of the matter is that 99% of real world applications run just fine, and it doesn't pay off to invest in shaving milliseconds here or there. Would it be desirable to have a magic wand to improve some edge cases? Yeah, why not? Is it worth to pay people to spend time with a stopwatch at hand to shave off these milliseconds? Not really. It's all about tradeoffs, and there is no real world payoff in wasting developers' time to shave off that millisecond here or there.
You know, I started on some concrete examples for you, but I had to stop and back up. Really? You can't think of any examples yourself? The modern web is absolute hell to use if you aren't on modern hardware. Try it sometime; use a 10 year old phone, or an old computer that wasn't built top-of-the-line.
There's so much hardware out there that can run native applications just fine, that can play back HD video, that can run complex 3D real time video games, but crawl like molasses when loading your average webpage. Facebook and YouTube are terrible offenders, but so are your average blogs. Many banking websites are terrible (yet they don't have to be; my local credit union has a zippy website that looks attractive to boot, has modern design elements, etc.).
Maybe the hardware you're running is eye-wateringly fast, or maybe it's just barely fast enough and you don't need the cycles for anything else. But we're not talking milliseconds. We're talking order(s) of magnitude. I can't bring myself to believe you don't see at least some of it, if you just open your eyes and look around.
The trouble with this attitude, is that it misses the fact that the _range_ of computing power of devices which ordinary people use is probably greater now than it's ever been.
Let's not even get into low end phones in developing countries or bargain basement android tablets, let's stick with something straightforward - an ordinary PC.
I took a quick look online, sorting by cheapest first I found something with an AMD 3015e in. Based on cpubenchmark.net that gets a benchmark of 2691. Taking a look at the big list of CPUs I see that's equivalent to a powerful desktop CPU from 2008, or a decent laptop from 2012. (The Apple M2 in the current Macbook Air gets a score of 15369, just for comparison.)
So, if you're writing PC software or making a fancy web app and you want everyone to have a good experience with it then you should see how it runs on a terrible new laptop, or a 12 year old good laptop, or a 15 year old powerful desktop.
(And yeah we all have SSDs now which is much better than in the old days, and JS is generally single thread, and single threaded CPU performance has not improved so much - but I think my point still stands.)
Since we're talking about React, Facebook's web app is incredibly slow to me, and typing text in some fields is slower than me. As in, it sometimes takes a second for the letters to start showing. And no, it's not a browser rendering issue (even if it were it would be terrible), as disabling JS in Safari brings back the performance.
Another app that I'm not sure if it's React or not is New Reddit. It is significantly slower on my computer and on a lot of people's computer and sometimes you have to refresh the page because it consumes too much memory.
I can come up with other local examples, from Germany. Vatenfall's website seems to "traditional" page navigation, but the content is loaded via a framework. Due to having to reinitialize everything, text takes up to 5 seconds to appear when using the back button or when navigating for page to page. Similar things happen in the German Agentür fur Arbeit.
My hot take is that React doesn't directly _cause_ slow performance, but that React is so well marketed as a first place to start for new programmers that the bar for quality is much lower than other industries (like game programming, or even "vanilla JS" which is increasingly seen as an advanced approach).
The promise of frameworks like React is that you write code in their way and they take care of performance, because functional, declarative and all that. You just use primitives and don’t control how it all works under the hood. Coping may be a good strategy here, but isn’t a good argument.
> The truth of the matter is that 99% of real world applications run just fine, and it doesn't pay off to invest in shaving milliseconds here or there.
You're taking deep quaffs of the Kool-Aid and so are most of the people commenting on this story. General software responsiveness and reliability (i.e., usefulness) has been in decline for decades. This is an objective fact.
Writing objective reality off as mere "milliseconds", "edge-cases", or only relevant for "toy problems" exemplifies the arrogance and severe incompetence of most programmers. People are seriously trying to talk down to Casey Muratori when in all likelihood they haven't accomplished even 1% as much as him as programmers.
I get it—no one wants to leave fantasy-land as long as the easy money is flowing. But sooner or later the glittering carriage turns back into a pumpkin.
I realised when I implemented EdDSA for Monocypher that optimisations compound. When I got rid of a bottleneck, I noticed that another part of the code was the new bottleneck, and some of the optimisations compounded multiplicatively. It took many changes before I finally started to hit diminishing returns and stop. All while restricting myself to standard C99, and trying fairly hard not to spend too many lines of code on this.
My point being, if most of the program is slowed down by a slew of wasted CPU cycles (costly abstractions, slow interpreted language…), there's a good chance what should have been obvious bottlenecks get drowned in a see of underperformance. They're harder to spot, and fixing them doesn't change much.
So before you even get to actual optimisation, your program should be fast enough that actual optimisations have a real impact. And yes, actual optimisation should be done quite rarely. But first, we need to make sure our programs aren't as slow as molasses. See https://www.youtube.com/watch?v=pgoetgxecw8
> I realised when I implemented EdDSA for Monocypher that optimisations compound.
I feel you're missing the whole point.
It's immaterial whether anyone can get to optimizations that compound multiplicatively. The whole point is that halving something that costs nothing earns you nothing. That's the whole point. Go ahead and shave off that millisecond. Will anyone actually notice whether you add or remove that penalty? Odds are, not at all.
People did notice. Quite a few happy users are glad signature verification took less than a second instead of more than 3. Or 30, if you compare to some of the alternatives. Others love the fact it uses 2KB of stack space instead of 5.
Monocypher's speed was actually an important component in its success in the embedded market, even though I didn't explicitly target it initially (I was lucky my portability driven decisions made it a good fit there).
Not your parent but I think that is exactly the point lots of people here are making.
There definitely are niches where there are quite a few performance optimization opportunities that users do care about.
In your example making something a user is actively waiting for go from 3 seconds to less than one is a great optimization target. What is not a great optimization target is making something the user is actively waiting for and that takes 30ms take 25ms instead. That's wasted money on developer time.
If your "user" is a developer of embedded software with memory constraints and using your library leaves them more room that's awesome. If your user was someone using the library on a general purpose computing device with loads of memory then the 2 vs. 5 does nothing.
You need to do some research on how to get modern C/C++ compilers to vectorize.;-) No assembly required, and not that hard to restructure code. (But MUCH easier in C++).
I tried auto-vectorisation, and it worked pretty well. But the code became just as big as using intrinsics would have (that with explicitly unrolling loops and rearranging things in memory), and intrinsics generated code that was easily 35% faster.
I decided not put it off for later, and keep things simple for now.
So far it seems like if you have some parsing or formatting task that can be trivially vectorized, the compiler will never do that, and you absolutely must use intrinsics.
> the whole world is comprised of the 99% of cases where shaving off that millisecond buys you absolutely nothing
Who was talking about a single millisecond here?
I notice, broadly, two types of people who engage in these arguments.
1> OMG, computers are thousands of times faster than they were a decade ago, why is everything not lightning fast? Why are so many things slower than they were back then? Why is my chat program eating 2GB(!) of RAM?
2> Because we're busy writing six billion features on our Nth iteration of this problem space, we can't be bothered to shave a few millis bro!
Profiling. If you're not profiling, you're completely wasting your time. The 1% is almost never where you think it is.
And when you do identify the 1%, you need to be testing optimizations with a profiler constantly while optimizing. Profile. Do some optimization. Profile again. Roll back if not successful. Repeat until done. It's impossible to optimize well if you're not doing profiling.
The ultimate tools would be either Intel's profiling tools, or ARM's profiling suite (both very expensive). But MSVC and GCC do a fantastic job of scheduling instructions to avoid pipeline stalls these days, so these deep profiling tools are unlikely to gain more than 2 or 3% performance increases these days. (Worthwhile pretty much only if you're writing GPU drivers for NVidia or AMD).
Taken as a given that 100% of your code is at least algorithmically correct in the first place. (Appropriately better than O(N^2) whenever possible).
- Former writer of graphics drivers, currently audio DSP engineer.
I'm not familiar with how it compares to ARM/Intel's profiling tools, but I found the Linux perf suite to be very capable (though limited to Linux obviously). And Hotspot [1] allows effortless profile visualization using flame graphs, including some very interesting features such as off-CPU time profiling [2]. "perf record" coupled with Hotspot forms a very smooth edit-compile-profile cycle.
Agreed. Linux perf tools are perfectly acceptable for all but the most ultra-extreme optimization tasks.
VTune allows you to determine where pipeline stalls are occurring at the instruction level (for that last 2 or 3% gain in performance). I haven't worked with ARM profilers (way out of my price range), but I assume, given the exorbitant price, they provide the same sort of in-depth analysis. Probably a handful of people on the planet that need that kind of in-depth analysis.
In my experience: a profiler, usually. Just because I can throw down a lot of code quickly doesn't mean I don't have the tools to analyze code when I go "hmm, that seems slow".
Sure. That's a tradeoff you consciously make to get the thing out the door. That's what technical debt is. You pay it down later. (Or you go bankrupt and it doesn't matter anymore.)
Testing is often not enough. Developers might have a few test cases with a few records, or a few other users. Real users might do more with it, and things which perform fine in testing suddenly turn into real performance bottlenecks when you're loading an order list with a thousand entries in it instead of two.
When I said testing that includes integration testing, not just unit tests. We do this exact thing with queries that are known to be complex, run them against databases the same size and complexity as real production databases. It's not hard.
Building representative databases is not straightforward.
If you're lucky enough to eat your own dogfood, run unit tests against a copy(!) of your in-house database. A utility to anonymize the data was a fairly significant investment, but absolutely worth it in the long run. Being able to run and monitor benchmark unit tests for selected critical operations on enterprise-scale test data as part of the continuous-build process: fabulous!
In any organization it generally pays off to have most people have a basic knowledge of a subject and then hire domain experts to drive most of the impact. For security, for example, this typically manifests as having very general "best practices" for most developers to follow and then a small team that handles anything that requires advanced understanding of the area.
How this typically works with performance is very similar, with a small team working to identify problematic areas where optimization would drive the highest impact, and the rest of the organization keeping performance in mind but not otherwise concerned with it in their day-to-day work.
I don't disagree that the balance is shifting towards "why is this taking so long". There's ebbs and flows in that ecosystem.
But overall, I think you overestimate how much time you spend loading the website and how much time it's just sitting there, mostly idle.
And in the end, as long as it's fast enough that users don't stop using the site/webapp/program/whatever, then it's fine, imho. When it becomes too slow, the developers will be asked to improve performance. Because in the end, economics is the driver, not performance.
As an end user, I would prefer to live in a world where I don't have to wait for software to respond. If forced to, I would also continue to use software where I do have to wait. Your argument that economics will dictate the best solution and lead to a happy balance doesn't work. It will tend towards a borderline tolerable world, where your product only has to be better than the other guy's.
This is separate from the discussion about tradeoffs between flexible design patterns and low-level performance.
He never said "happy", he said its fine. Meaning its fine that you have to live in a world where you have to wait for some software to respond, even if you would prefer not to. If you are happy in such a (horrible || ok) world is up to you.
I think the point would be: what if instead of using a whole core and it takes 2000ms to load because it is essentially "spinning its wheels" it would only use half a core and it takes 50ms?
2s you notice as a user. 50ms you won't. In fact even 500ms you won't notice too much but we are getting close to where optimization will be noticeable to a user.
And depending on what you do, economics will let performance decide who wins. Git is probably one example. A different example, have you ever compared battery lifes of smartphones before buying one?
> That's really not even close to true. Loading random websites frequently costs multiple seconds worth of local processing time
Unless we’re talking about specific compute-intensive websites, this is almost certainly network loading latency.
Modern web browsers are very fast. Moderns CPUs are very fast. Common, random websites aren’t churning through “multiple seconds” of CPU time just to render.
I just loaded cnn.com while looking at the CPU utilization graph: 50% use of my 8 logical cores at 4.5GHz for the better part of a second. So no, it's not just network latency. Doing multiple parallel network requests, parsing, doing layout and applying styles, running scripts, decoding media... a modern website and browser devours CPU time.
Jira cloud famously takes 30-60 seconds on even a fairly high-end laptop, which is just staggering. I can install an entire operating system into a virtual machine in that time.
I was about to dispute what you said, but I realised I was running uBlock origin. What machine and browser are you using? I just used Chrome (on MacOS using an M1 Pro CPU) to run a performance profile and loading cnn.com took the following:
- Loading: 29ms with uBlock vs 42ms without
- Scripting: 548ms vs 1850ms (!!)
- Rendering: 42ms vs 105ms
- Painting: 8ms vs 29ms
- System: 216ms vs 295ms
- Idle: 460ms vs 8707ms
Holy shit, advertising and trackers are absolute resource hogs.
one of them must be to blame! readers and publishing engineers famously sit atop corporate decision-making hierarchies, and no one else in a mass media enterprise ever did anything wrong, certainly not before the web was a thing
I once optimised a SPA app that had to be really fast for usability reasons (industrial use), I replaced all the 'high level' JS patterns such as map, filter, and frontend framework things to use just if else and for loops and native dom manipulation, and it ended up more than 10x faster, each click would update the app in one frame, it was very noticeable. So yes CPU cycles do matter for websites, even with modern hardware. However the code was more verbose and needed a lot more technical know-how to understand and maintain.
Yep. In Rust land for instance it'd get compiled down to a for loop and be ridiculously fast. Every time you do a .map or a .filter in JS though it gets abstracted down to a function that makes an allocation for the ENTIRE ARRAY and then copies the entire array into that new array doing whatever you asked to data. The JS VM might be able to optimize some of it away but the abstraction is awful.
So if you were to manually run a for loop over an array instead of iterating through it I'm not surprised you got an order of magnitude faster performance.
So normally when you do a bunch of patterns of .filter(), and/or .map(), and/or .reduce() on an array in most other languages the compiler will normally iterate through the elements doing whatever you requested at each element. No allocations, one quick trip through the entire working set, cache locality works no matter how big the set is.
In JavaScript on the other hand it handles the abstraction by constructing a separate function for each pattern you use. In those functions it allocates an array the same size as the working set, iterates through each element, then returns the new array. If you're doing multiple operations at once this means that you have to have multiple allocations and multiple iterations through the entire working set.
Because it's doing an allocations it's basically doing an extra memcpy for each additional pattern past one which is a giant slowdown. Then if the working set is too big for the L2 cache it needs to be reloaded from L3 each loop. If the working set is too big for L3 then it needs to reload from main memory EACH TIME.
If you wanted to implement patterns as slow as molasses I can think of no better way than to sugar it out like JavaScript did.
Approximately, the slowest thing you can do in a program is memory allocation (and garbage collection is even slower). JS map allocates an entire new array.
This might hold true if you’re talking about desktop browsers, but it’s a different story on mobile, particularly in rapidly growing emerging markets. Both network latency and large JS payloads dramatically affect user experience on low powered devices, and if UX isn’t compelling enough, there have been plenty of studies showing the real financial costs of slow web pages for businesses that depend those websites to bring in customers.
I’ve personally spent many hours doing performance analysis, triage and remediation on websites built using modern tech stacks that had inadvertently exchanged UX for DX. Too much JS sent over the wire can definitely tie up the browser’s main thread for whole seconds even on desktop, though in my experience it’s much more common on mobile. This situation can be difficult to correct depending on the abstractions, organization and overall architecture you chose early on, and code-spitting and dead code elimination won’t always fix what’s broken.
Well, you can make things run even faster by hand-coding it in assembler... but performance isn't the reason we use high level languages. I agree with you that ignoring performance characteristics in favor of speed-to-market is an awful and pervasive practice in modern software development, but the linked article isn't talking about or making that case at all. He's saying that he can make his own custom object oriented C language that runs faster than C++ itself, but that's not news - people were saying that in 1995 (at least). The maintainability hit isn't worth it.
It's not really possible to write hand-coded assembler that's faster anymore. C/C++ compilers have deep knowledge of architecture-specific instruction pipelines that allow them to schedule instructions more accurately than any human could.
My most recent misadventure: trying to write hand-coded ARM Neon assembler, and/or C++ code with neon intrinsics to optimize a piece of real-time audio effect code. The clear performance winner: plain C++ with no intrinsics, but tweaked to allow auto-vectorization (plus judicious use of the __restrict modifier for a small but significant boost). GCC produced code that had better instruction scheduling than I could (but not for the NEON intrinsics, oddly). And as an added bonus, the same plain-old C++ code generates AVX vectorization on MSVC without modifications! (MSVC also supports __restrict until the C++ standards committee gets their act together to adopt the eminently necessary C99 restrict keyword).
>It's not really possible to write hand-coded assembler that's faster anymore.
That is wrong. Just because you didn’t succeed in writing faster code in one case, doesn’t mean it’s impossible. See e.g. [1] on why the Lua vm is written in assembly. It’s from 2011 but not much has changed in the meantime.
Since you bring up React in your example, which framework should one use to build better performing web apps?
I know React tends to lack in both dev UX and performance (at least in my exp). Personally I've taken a look at Svelte and Solid, and liked them both. I haven't had the chance to build anything larger than a toy app, though.
I recently tried Vue 3 at a startup for a new app. A few days after starting it I rewrote a personal Svelte app in Vue 3 since I found it so fluid (Composition API w/ script setup). I was liking Svelte before that.
Vue could also be an option, but I personally want to learn some Solid, as I see it could be preferred by the current mass of React frontend developers, more than Svelte and Vue. The syntax and philosophy of Solid looks closer to React, while having a stronger focus on performance.
Okay, I'll bite: does Vue perform better than React? Your post makes no mention of this, I don't know if it does, and offering it as an alternative due to performance reasons, without knowing this, seems a tad premature.
In my personal experience, the Vue apps I've worked on have been snappier. There are some fast React apps out there, but I think a lot of work goes into optimizing React apps, versus Vue being pretty fast by default.
When react introduced hooks, it was fun for a while, but then we discovered the frequent re-render issues and had to change the way we think when building components and refactor old components. We have to manually wrap components in useMemo. This is the kind of things Vue has avoided me so far and React let me down. React relies on running all render methods instead of doing granular updates, I think this hurts performance. Vue listens to property changes and can perform granular updates.
I kind of want to amend that! I discovered today that maybe there is hope with granular update libraries such as Jotai, @preact/signals-react and recoiljs.
I look forward to returning to the days where everybody wrote their own slightly-to-significantly-wrong state management tooling while being distracted by the minutiae of DOM wrangling. That was a good time.
(It was not. It was why I stopped doing frontend work.)
Yeah, but it's the one we've got. As much as people want to sniff about it, that bell isn't getting un-rung for more, perhaps most, use cases--on balance things are better where we're at now.
For what it's worth, less than 4% of websites use React (approximately 4% use any JS framework) . If you believe the web is slow because of React you are wrong. It's not even due to JS.
4% of what? I would wager a guess that close to 100% of Alexa top 1000 websites us a JS framework and a significant portion of that something heavy like React or Angular.
Oh it definitely is. But no single framework (or frameworks) are to blame. It's the CMS. The thing that allows every Tom, Dick, and Harry at the company to drop their little snippet for this or that which adds up to a mountain of garbage over time, half of which is disused or forgotten.
I’m curious, where do you get those numbers from? Those are shockingly low numbers and don’t align with my observations, but I also don’t have hard figures to back them up.
I would argue that ignoring performance, a lot of "clean" code isn't really that much clearer and more maintainable at all (At least by the Robert Martin definition of "Clean Code"). Things like dependency injection and runtime polymorphism can make it really hard to trace exactly what happens in what sequence in the code, and things like asynchronous callbacks can make things like call stacks a nightmare (granted, you do need the latter a lot). Small functions can make code hard to understand because you need to jump to a lot of places and lose the thread on the sequential state (bonus points if the small functions are only ever used once by one other function). The more I work in code bases the more I find that overusing these "clean" ideas can obscure the code more than if it was just written plainly. I think a lot of times, if a technique confuses compiler optimization and static code analysis, it's probably going to confuse humans also.
There's some videos on the internet claiming object oriented programming is pretty bad in many situations. And lately I've been wondering if there's a kernel of truth in this statement. As an alternative, often procedural programming is advised instead.
I think a lot of the clean code advice in general related to object oriented programming.
I've noticed that once my Lua programs (games) grow to reasonable size, it becomes kinda hard to maintain. And I tend to use an object oriented programming style (of course it also doesn't help that Lua is not typesafe). After I finish my current game, I want to try to make a game using a procedural approach. I wonder if this would solve some of the issues I see in my current code base.
One of the core ideas of procedural programming is that data and functionality is not mixed in classes as we do in object oriented programming. Instead, you might have a module that contains some functions and some data objects the functions act upon. This approach would make some other aspects of game programming with Lua easier as well (e.g. serialisation), but perhaps it will make the code also easier maintainable as the size of the codebase grows. It's something I want to contemplate upon.
I think it is important to understand what makes a style actually different and what is just semantics.
For instance, if you write a function to do some operation on an object, you could have written that as a method instead. But ultimately it is the same code, it is unlikely that the difference matters much for either performance or readability.
However if you need to do some operation on a bunch of objects you could pack each operation on the individual objects in a method, and call those methods from a main function. Or you could just put it all in one function, with as many nested loops and if statements as there needs to be. Now the difference is real, you pay in performance for a lot of function calls, and following the control flow is different.
Personally I tend to prefer the one function, but sometimes part of it makes sense as its own function, in particular when I can avoid duplication that way.
There is no silver bullet, but best of luck, changing one's style can be hard.
Separating code from data definitely helps for serialization; while they aren't great for game development it's incredibly nice in typescript/javascript to do it that way. It can also help for things like network code or for cloning an object.
When I am coding for myself I try to separate the data from the operations performed on it. I also think about how the data flows through the application. I use a mix of procedural and functional paradigms.
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
CPU meter when clicking anything on "modern" webpage proves that's a lie.
Also, sure, even if "clicking on things" is maybe 1-5% vs "looking at things" THAT'S THE CRITICAL PATH.
Once the app rendered a view obviously it is not doing much but user is also not waiting on anyting and is "being productive", parsing whatever is displayed.
The critical path, the wasted time is the time app takes to render stuff and "but 99% of the time is not doing it" is irrelevant.
>Clean code optimizes for improving time-to-market for those features
Does it though? Where's the evidence for it? The vast majority of people I've worked with over the last couple decades who like to bring up "clean code", tend towards the wrong abstractions and over abstracting.
I almost always prefer working with someone who writes the kind of code Casey was than someone who follows the clean code examples I've spent my career dealing with. I've seen and worked with many examples of Data Oriented Design that were far from unmaintainable or unreadable.
Completely agree. These rules simply do not lead to better outcomes in all cases. Looking at the rules and playing Devil's advocate for fun:
> Prefer polymorphism to “if/else” and “switch”
Algebraic data types and pattern matching (a more general version of switch), make many types of data transformation far easier to understand and maintain (versus e.g. the visitor pattern which uses adhoc polymorphism).
> Code should not know about the internals of objects it’s working with
This is interpreted by many as "don't expose data types". Actually some data types are safe to expose. We have a JSON library at work where the actual JSON data type is kept abstract and pattern matching cannot be used. This is despite the fact that JSON is a published (and stable) spec and therefore already exposed!
> Functions should be small
"Small" is a strange metric to optimise for, which is why I don't like Perl. Functions should be readable and easy to reason about. Let's optimise for "simple" instead.
> Functions should do one thing
This is not always practical or realistic advice. Most functions in OOP languages are procedures that will likely perform side effects in addition to returning a result (e.g. object methods). Should we also not do logging? :)
> “DRY” - Don’t Repeat Yourself
The cost of any abstraction must be weighed up against the repetition on a case-by-case basis. For example, many languages do not abstract the for-loop and effectively encourage users to write it out over and over again, because they have decided that the cost of abstracting it (internal iteration using higher-order functions) is too high.
my 2 cents: I don't see algebraic data types as strictly superiour. It's just the other side of the polymorphic coin: there is open and closed ploymorphism. Open polymorphism happens with interfaces, inheritance, and typeclasses- the number is unlimited. Closed happens with ADTs - an enumeration of the cases.
Open is great for extensibility: libraries can be precompiled, plugins are possible. Changes don't propagate - it is ""forward compatible"" which is great for maintaning the code.
Closed on the other hand is great for matching. Finite is predictable & faster. Finite is self-contained and self-describing because it exposes the data types without shame.
The purpose of the visitor pattern is now clear: it closes the open polymorphism for a finite set. Great, now we only need one kind and we still get matching. Or is it the worst of two worlds? Slow & incomprehensible and all changes propagate everywhere.
So which one is better? Neither. But the reality is that all old imperative languages with polymorphism chose the open kind - the kind that adds more features because it was needed for shared libraries. Leaving you to build any pattern matching yourself and to burn yourself with the unmaintainable code.
If people get burned, they learn. First they say don't do that and only then they replace the gas stove by induction.
You are confounding three separate skills. Finding the right abstractions is an art, whether you write clean code or not. Writing high performance code is another art.
A really good developer writes clean code using the right abstraction (finding those tends to take the most time and experience) and drop down to a different level of abstraction for high performance areas where it makes sense.
The fact that bad developers suck and write bad code no matter if they use clean code or not does not reflect on the methodology
If there are no hard measurements I can use to determine the value of "clean code" then I fall back on the results of produced by people who say they are writing clean code. There is no realistic way to objectively measure coding styles completely isolated from the people writing the code.
I personally haven't seen value from that coding style. There may be some platonic ideal clean code that is better than other methodologies in theory--it is likely that my sample is biased--but from what I've seen, the clean code style tends to lead most developers towards over abstraction.
I agree but i think that mostly comes from Clean Code being kind of required reading for junior developers, that lack the experience to understand those concepts in context. No methodology is perfect, and there are always cases where one needs to break out of them, to know when to do that comes with experience.
For juniors which have no experience, any sane methodology is better than none, since otherwise you get even more of a mess.
That said, Clean code has some great advice, some mediocre advice and some frankly bad advice, but the authors point are largely irrelevant to 99 % of software engineering.
It is easier to find an abstraction if we lay out what the program is doing all in long functions, that just "do what they do" until you figure out what needs to be abstracted.
Having done a lot of performance work on Gecko (Firefox), we generally knew where cycles mattered a lot (low level graphics like rasterization, js, dom bindings, etc...) and we used every trick there. But for the majority of the millions of LoC of the codebase these details didn't matter like you say.
If we had perf issues that showed up outside they were higher level design issues like 1) trying to take a thumbnail of the page at full resolution for a tab thumbnail while loading another tab, not because the thumbnailing code itself was slow, or 2) running slow O(tabs) JS teardown during shutdown when we could run a O(~1) clean up step instead.
What you're basically saying is "modern computers are so much faster than anyone needs them to be, it's okay to make them a little slower."
This works until your computer is old enough to be slower than what a majority of wealthy people (ie desirable customers) are using, at which point you need to buy a newer, faster computer, even though your current one was already "faster than anyone could reasonably need it to be".
This is all harmless enough—a little disrespectful perhaps, to make other people waste their money, but not so terrible—until you consider the environmental impact of all these new computers, which the average spreadsheet absolutely should not need but does anyway. It's also an equity issue—someone on a fixed income can't necessarily afford a new machine.
What would actually happen if Moore's law ended tomorrow, and we were no longer able to make computers any faster than they are today? It would really suck for scientists and hardcore gamers, but I actually think a majority of computer users would benefit The experience of someone who just writes documents and checks email would be unchanged, except that their current computers would never slow down!
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.
Assuming it is right, there is something called multitasking, the CPU, RAM, and most importantly, the cache is not all yours, if there is 1000 pieces of software like yours, that's 100%. You may argue that 1000 pieces of software is unreasonable, and you would mostly be right, but it happens, and mostly for the same reason software isn't optimized: quantity over quality.
Another issue is that you have to make a distinction between throughput and latency. You don't have to keep up with a sustained 100 actions per second, people don't go that fast, but you definitely have to respond within tens of milliseconds, because more than that is noticeable. Latency is much harder to optimize and if you are in the critical path, these cycles may matter.
A lot of devices are battery powered these days, and all these wasted cycle are reducing the battery life of the entire system. Mobile devices are crazy powerful these days, but this power is meant to be used sparingly. And even with line powered devices, I think we waste enough energy as it is...
And finally, what is the point of "clean code"? Hopefully not just because it gives software architects boners. The point is usually to make software that will last: easier maintenance, less bugs, etc... But performance bugs exist too, and one of the most common software evolution is to do more of what the software already does. An image editing software will process more and bigger images, a database will store more entries and more details about each entry, documents will get larger, etc... You may even find that your users are using some feature on a scale you never intended, maybe someone is pasting entire books on your note taking app, and it may turn out working quite well... if you cared about performance. Not caring about performance is technical debt, and it may negate the advantage of using "clean code" in the first place.
> I think the author is taking general advice and applying it to a niche situation.
I don't. how many programs are running in your OS right now? how much CPU do you need to keep those things plus the things you need running in a performant manner?
how much CPU would you need if things performed better? the answer is "less" every time.
better software performance = less money required for hardware to obtain the responsiveness you require.
it's important, and it's important completely independently of how it is framed here.
just wait till you've seen software get slower for 30 years. to put it another way, watch hardware get faster and faster and faster for 30 years while you observe software continually consume all of the available headroom until it feels slow again. watch that happen for THREE DECADES and wait for someone to tell you that everything is fine and that someone saying "software is unnecessarily slow" is wrong because they aren't framing their argument how you think it should be framed.
>Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.
All that says is you should focus your energy on the increasing the value of .1%. It's not actually an argument to not spend any energy.
It's like saying 'Astronauts only spend .1% of their time in space' or 'tomatoes only spend .1% of their existence being eaten' - that .1% is the whole point.
You can debate how best to maximize that value, more features or more performance. The OP is suggesting folks are just leaving performance on the floor and then making vacuous arguments to excuse it.
I don't know man, my TV has hardware several orders of magnitude faster and more advanced than the hardware that took us to the moon, and it takes dozens of seconds for apps like Netflix or Amazon Prime Video to load dashboards / change profiles or several seconds to do simple navigation or adjust playback. People just don't know how to properly write software these days, universities just churn out code monkeys with a vicious feedback loop occurring at the workplaces afterwards.
yes. I've observed hardware get faster and faster for 30+ years and I've watched software consume all of that headroom the entire time, for no clear reason other than the way we write software is just getting worse and worse and worse.
Not only time to market, but also maintainability.
In non-performance-critical areas, it's pretty important that when the original dev team leaves, new hires can still fix bugs and add features without breaking things.
that's a separate thing from polymorphism vs. switch case in the post that people confuse.
if he simply kept the original "unoptimized" switch case method, what you say wouldn't apply. it couldn't. from a pure feature standpoint, a switch case is functionally identical to polymorphism except that you can't add types that are unknown at compile time (like loaded at runtime as an extension from a library or something). and that version is already faster.
what the blog post does after that point is merely point out that by having everything in one place, you see opportunities to optimize. this is a separate thing where you still have to consider whether that actually makes sense.
like, if you can guesstimate that an entirely different type is likely to enter the picture at some point, you may skip this optimization. if production realities mean that code needs to be faster, you can still apply it and add some comments about how to change it back, or just keep the original version commented out with a reference for why its there. and so on.
> And now the shape_union must be rewritten from scratch.
We spend most of our time reading code. If the Casey's code snippets are easier to reason about (which they are, especially as the codebase get larger), that's a big win. I'd imagine you want to optimize for code that is easy to (re)write, rather than minimize the number of key strokes while increasing the time spent understanding the code.
I'd say that the needing rewrites is bad for the future clarity of the code. So much so that people conclude the 50% performance hit is worth it to use open polymorphism (interfaces, etc.) over closed (tagged unions, algebraic data types, etc.). So I optimize for the ability to reason about over the lifetime of the project above the current ability to reason and above performance (with exceptions).
What is adding a polygon going to do? Either back to the 'bad' interface to hide variable object size, extend to a tagged union with a size of the biggest datastructure - hurting cache performance each time it grows -, or an involved Object of Arrays structure - not good for clarity but great for performance.
All while having to remember which field, width or height, to use for the circles radius.
I think the problem comes from applying “clean code” as a standard pattern.
As demonstrated in the example, extensibility costs in performance, and also sometimes in comprehension. If we apply the “clean code” rules as a matter of course, we pay this price always.
In my opinion, we should use interfaces etc at module boundaries only.
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something
It's because of this mentality that almost all desktop software nowadays is bloated garbage that needs 2GB of RAM and a 5Ghz CPU to perform even the most basic task that could be done with 1/100th of the resources 20 years ago.
No, it's not because of this mentality. Remember the timeless quote:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%."
Using Electron and bloated frameworks that "abstract" things away is the biggest problem for most modern software, not the fact the developers aren't optimizing 10 cycles from the CPU away. It's a fundamental issue of how the software is made and not what the code is. If you need to run a whole web browser for your application, you already lost, there is no optimization that you can do there.
I think the sibling comment here in this thread shows what some developers think. Electron is not reasonable. "Developers" using Electron should be punished by CBT and chinese torture.
Okay maybe that's too much. But I suggest a new quote:
"We should forget about Electron, say about 100% of the time: electron is the root of all evil.
Yet we should not pass up our opportunities in rewriting everything in Rust."
RAM and GPU are cheap. Most users aren't going to notice. Meanwhile by choosing Electron, the developers were able to roll out the app on Windows, Mac, and Linux at nearly zero marginal cost per additional platform.
And other lies you can tell yourself to sleep at night.
Most people notice. Very few have the capacity, power (or realise) to complain about it. They accept what they’ve been given, despite how awful it is, because they have basically no other option.
A concrete example: my previous work had to use bitbucket pipelines for our docker builds. My current work uses GitHub actions. GH has my container half-built before I can even click through to the page. Bitbucket took a good minute to start. My complaints about bitbucket fell on deaf ears in the business, and no amount of leaving feedback for Atlassian to “please make builds faster” ever made the slightest amount of difference. Every time MS teams comes up that’s met with complaints about performance (among other things) so people definitely notice that. VS Code gets celebrated for “actually having decent performance”, so the bar is so low that even moderate performance apps receive high praise.
Users definitely notice performance, whether the PM/business cares is a different matter, but we should stop deceiving ourselves by saying it’s alright because “users won’t notice or care”.
Yup.. and if facebook didn't invest in a native mobile app, they'd been eliminated 10 years ago.
Performance is a feature. Or does anyone here enjoy using an old tomtom gps where every tap on the screen takes 2 seconds? If so, please donate your beefy laptops to charity.
But somehow modern software (Outlook, I’m looking at you) has trouble keeping up at my typing speed, and there’s a visible delay before characters appear on the screen. It doesn’t matter what the software does 99.9% of the time, if it’s an utter pig that crucial 0.1% of the time when the user is providing input.
An oft recited of thumb: Make it work, make it pretty, make it fast - in that order. That is, performance bottlenecks are easier to find and fix if your code is clean to begin with.
I sometimes wish performance was an issue in the projects I work with, but if it is, it's on a higher level / architectural level - things like a point-and-click API gateway performing many separate queries to the SAP server in a loop with no short-circuiting mechanism. That's billions of lines of code being executed (I'm guessing) but the performance bottleneck is in how some consultant clicked some things together.
Other than school assignments, I've never had a situation where I ran into performance issues. I've had plenty of situations where I had to deal with poorly written ("unclean") code and spent extra brain cycles trying to make sense of it though.
> That is, performance bottlenecks are easier to find and fix if your code is clean to begin with
That is not at all what "make it work, make it pretty, make it fast" is about. That saying is about prioritization. Making it fast doesn't mean anything if it doesn't work.
However, if you are doing performance-sensitive work then this is a very bad strategy. You need to design a performant architecture up front otherwise you'll likely have orders of magnitude worse performance, even after optimizing your code.
Ex: if your "make it work" design has shared mutable state, you're going to have a bad time when you want to scale that horizontally and unlock 100x better throughput/performance.
Here is the thing: The paradigms of your initial design, or if you're in a better situation, the first refactor, is likely to stay. If performance is what you think about last, you choose different paradigms (like e.g. polymorphism). From my experience, a winning strategy is to think about things that scale already in the initial design, and what scales in a backend-system is often the amount of data you want to pipe through it. Thus, keeping it data centric (primitive types and arrays for raw data), while using polymorphism and functional interfaces to keep the methodologies clean, is usually a good idea.
So, most code that I write tends to have dead simple data types, but combines it with classes (and subclasses) that represent methods (strategies) on how to retrieve, transform, present and store the data. The 'make it work' phase may do this in a simple script, but the actual data model tends to stay the same.
I think that I and all non-technical folks around me experience issues with applications performance daily. I think that sometimes "making it work" should include some particular performance metrics. If it is not fast enough it doesn't really work. Now "fast enough" is something to be defined and different depending on the application part.
Far too often I see applications that assume low latency and unbreakable Internet connection. They seem to do almost no caching at all. For example thumbnails.
Also many of applications will be almost unusable (or trigger OOM) when you try to work with a big file. Sometimes a big file has merely tens of MB, sometimes problems start with a 3MB file. Those are the issues that occur without thinking about performance from the start - memory is free, you can copy things around, everything will fit in RAM.
One more thing. When your application consists of a client and a server it may turn out that you will put yourself in a corner when not thinking about performance early on. Everything will work without any troubles at first and then it turns out there are some latency issues with more data and you can't easily upgrade the client for example. Or you had an architecture that allows to spin up more servers and handle the load closer to the client, but it can cut your margins.
I don't think it is. There are times in projects where you're exploring the problem space and figuring out how to get implement whatever you're doing and to speed up that process, you can take shortcuts such as assuming that a value always exists, there are X many users, or this function will never fail so you `expect()` on it. Also very little testing and no documentation. Once you have something that works, it's time to make it correct by removing those assumptions and adding tests and documentation. I believe this is what the distinction between "make it work" and "make it right" is.
I can measure "works" and "fast." Good can mean lots of things, it's a subjective assessment of code quality, readability, and maintainability. More experienced programmers probably have a better idea about how to achieve readable and maintainable code, but so far no one has come up with a measurable way to make code "good." Bob Martin spells out his ideas of "good" in his books but I don't find his examples particularly good, or even real-world. The only useful guide to "good" code I've come across in 40 years programming is The Elements of Programming Style, which Bob Martin has managed to expand from a thin pamphlet to multiple large books.
Sure I leave apps sitting around waiting for my input quite often. But then when I do start inputting stuff, they lock up, fail to respond to my inputs, show me loading screens, blank screens, delayed responses, and otherwise waste my time. Pretty silly given how much money I pay for the hardware.
It isn't about "micro-optimization", that's just what bad developers use as an excuse for never caring at all about performance. Casey uses the term "depessimization" to describe the process of making a program not run like shit.
Modern computers are ludicrously fast, and modern developers have somehow managed to make them slow.
Regardless, these programs should be spending 99% of their time waiting for user input, but instead they're working data through a mountain of abstraction layers on the faulty assumption that it is a) saving developers' time, and 2) that that is worth much more than user time.
Emotionally we have no idea how bad it is to waste as little as a few seconds per day for millions of users. It's just a few seconds, right? We're just forgetting to multiply those seconds by the number of users.
Accidentally super-linear algorithms happen a lot more when you hide stuff behind three layers of interfaces rather than seeing "oh, this uses a list instead of a set".
True story. And after taking a couple jobs optimizing some messy software modules I've learned to not take those jobs anymore. The only things you can do is throw the crap out of the window and to start afresh with a clean understanding of what we want to achieve. A simple and straightforward solution without BS abstractions is not only much faster but also much less bug-ridden.
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
Value of time is disproportionately weighted by user attention, which is at its highest right around when user input is happening.
>Look, most modern software is spending 99.9% of the time waiting for user input
That's fine, that's as it should be and isn't an interesting metric. The computer should wait for me, not the other way around. Needlessly waiting for the computer is a sign of s%&t software and not as uncommon as we'd like, huh?
Did you read the title? He's addressing performance.
And you're saying for many developers, performance is not the biggest priority.
That's fine. But it doesn't make him even 0.000001% wrong. And he's not applying anything to any niche situations. You just missed his point. Performance.
In my experience, software has been getting slower faster than hardware has been getting faster. Bringing focus back to performance would be a welcome improvement.
> If you have a GUI for a monthly task that two administrators use, then no.
Fine, but be honest with yourself and admit that you are contributing a lot to making the lives of those two admins miserable.
It doesn't matter if I'm using your software once a month, or once a day. If it's anything like typical modern software, it will make me hate the task I'm doing, and hate you for making it painful. In fact, shitty performance may be the very reason I'm using it monthly instead of daily - because I reorganized my whole workflow around minimizing the frequency and amount of time I have to spend using your software.
> Fine, but be honest with yourself and admit that you are contributing a lot to making the lives of those two admins miserable
The funning thing is I'm thinking of a specific case and I work closely with those admins. I even have filled in for them when they're sick. Yes I know it's a pain, they know it's a pain, and they let me know it's horrible. The only reason this one is monthly is that it's a stock take. They forgive the crappy performance as it still saves hours of work when compared to the previous manual option of entering things into multiple systems.
You’re not wrong, but today’s software is so slow/high latency so often, despite incredibly powerful hardware, that as a rule it should absolutely matter.
Like everything else in the software industry the context does matter: at larger scales small gains in performance translate to large savings in costs (infrastructure, maintenance, etc.).
Also, "clean code" (as in from the "Clean Code" book) is generally not good advice for most programs anyway. Not only does it eat performance, it's not all that great for building maintainable, extensible systems.
What does it matter if it does this 0.1% ten times slower than it could? Then user will have to wait for the software which slows the most expensive component of the whole work setup, the human.
if the software ever takes more than 10ms to do something it is stealing time from the human. the human is the slowest part of the system so nothing else should ever make the slowest part even slower.
I think we absolutely should care, because when the devices do something, the user still expects software to be fast.
So if it is not fast enough, he buys a new device, because this is what he can do. He can't just buy the software rewritten. Usually
>most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something
Bad mindset.
A GFCI breaker spends 99.99999% of the time waiting with zero leakage current. Yet, when it does detect leakage current, you want the breaker to trip as quickly as possible.
See where I'm going?
Imagine if it would take 5 seconds from flicking a light switch until the lights actually turn on. Because the switch is waiting for user input 99.99% of the time, right? Would you install that light switch in your home?
Except when it isn't. I remember an article making rounds the other day, that claimed the whole "most software spend most time waiting on I/O" common wisdom is no longer true, as most software these days is CPU-bound, and a good chunk of that is parsing JSON.
> For pure compute most software is memory bandwidth bound
This is related, though. While the final leaps of performance often come from using more memory to make CPU work less, in my experience, most of the performance-problematic code wastes both CPU and memory, and meaningfully faster alternatives end up using less of both.
Either way, from the end-user perspective, I tentatively agree with the article I mentioned (but don't have a link handy, sorry) - most of the software I end up using, or see people around me using, is clearly CPU-bound, briefly network I/O-bound (mostly web apps), and rarely local I/O-bound.
EDIT: I guess the caveat is that end-user software is often spending 99% time doing nothing at all, just waiting for user input. But that bit doesn't matter - what matters is how fast it reacts to the input once the user starts providing it. This is where a lot of software suddenly gets CPU-bound (or net IO-bound, if doing something stupid).
Eh, that heavily depends on language and dataset you're working with. I've seen "simple" data with some fat thing like RoR on top of it having 10x the latency of the underlying database after all the ORMing.
Most of my experience with this actually comes from RoR. I worked on one so where rendering slowed everything down to a grind, but more often than not it was making Yu noptimized queries, too many queries, inefficient calls to other services. I used to work as a consultant, so this was true for quite a few apps. However, I also usually was the person most interested in relational databases and my view might have selection bias.
I have a similar experience. Also, for the longest time Rails was also incredibly slow at generating JSON, of all things. JBuilder [1] was a few orders of magnitude slower than using .to_json directly or other libraries.
I’d like to work in one of these teams/products where the database is the bottleneck. Basically everywhere I’ve worked, I’ve had to work with backends that have been foot-draggingly slow compared to the actual database.
I think there is a relatively comfortable middle ground. High quality, readable and performant code are not mutually exclusive optimization quantities - though they will start to compete against each other in the extreme. Often, O(WTF) algorithms are complicated, full of useless fluff and hard to understand - occasionally attempting to follow Uncle Bob's unresearched and unexamined ideas.
Honestly in many cases you can have your cake and eat it too if you just write in a functional or data-flow style rather than a rigorous OO style. Since this is C++, using std::algorithm or another algorithms library would let you abstract your implementation details while relying on the compiler's ability to optimise/vectorise/inline code as needed.
This applies doubly so if you can rely on templates & structural typing to push your polymorphism to compile time. clang & gcc are surprisingly good at optimisation as long when you don't have to bounce off a vtable and code is clean / avoids "manual optimisation".
Also while I'm not saying I don't believe the author here, I wish they would have used https://quick-bench.com/ or https://godbolt.org/ so that readers could trivially verify the results & methodology.
I don’t think the percentage of time waiting for input has anything to do with this. Outside of video games, the way most people will see performance problems is in the latency of their UI interactions. You press a button and want to see the result as fast as possible.
In other words, the user’s entire perception of your program’s performance falls into that 0.1%.
Performance is not an absolute. At the end of the day it is about user experience. From a computer science point of view, we can measure memory and CPU usage, but if the users haven't been complaining then what problems are you actually solving (at least from a product POV)?
Performance for performance sake is an interesting and appealing challenge to us engineers. I was writing C code in the 90s and I miss being that close to the hardware, trying to spare every clock cycle that I could while working with machines that had sparse resources.
But today I'm building SaaS products for millions of simultaneous active users. When customers complain about performance it is often not what us engineers think of as "performance." They're NOT saying things like "Your app is eating all memory on my phone" or "the rendering of this table looks choppy." It's usually issues related to server-side replication lag causing data inconsistencies or in some cases network timeouts due to slow responding services.
The point is the age old advice that we were giving aspiring game programmers back in the 90s:
Figure out and understand your priorities.
The famous inverse square root function in the Quake III Arena source code is a great example. If memory serves me, they needed this calculation as part of their particle physics engine. The problem is that calculating inverse square roots precisely is very expensive, especially at the scale they were required to. So they exploited how 32-bit floating point numbers are represented in binary in order to do a fast, good enough approximation. This is a good example of a targeted, purposeful optimization.
Back in the 90s we were obsessed over getting the most out of our hardware, especially when coding games. So we picked up all sorts of performance hacks and optimizations and learned how to code in assembly so that we could get even closer to the bare metal if we needed to. The result was impossible to understand and maintain code and so experienced engineers taught us young'uns:
Write clean code first, then profile to understand what your bottlenecks are, then to make TARGETED optimizations aimed at solving performance issues in order of priority.
That priority always being driven by user experience and/or scalability requirements.
Anything else is premature optimization. You're speculating about where your performance bottlenecks are, and you're throwing out maintainable code for speculative reasons; not actually knowing how much of an impact your optimizations are going to have on user experience.
I agree with almost every one of your points. I think I oversimplified my original point for the sake of clarity.
If you are throwing out maintainable code for the sake of performance, it had better be because you know that it's your bottleneck, and that the performance increase is worthwhile in the first place. "Performance for performance sake" shouldn't exist anywhere outside of hobby projects.
I would argue that responsive user interfaces are really important to user experience. Not many people are complaining because everyone is used to unresponsive apps, but that doesn't mean users wouldn't appreciate a more responsive app.
I would also add that there isn't always a tradeoff between performance and maintainability. If you can adopt some performant coding patterns that don't sacrifice maintainability, then absolutely do that. I think Casey's example of "switch-based polymorphism" is one such pattern(and I think the fact that Rust took a similar route to polymorphism is a vote in favour of this pattern).
Clean code (OOP, DRY, etc) is optimized for maintainability and extensibility, not necessarily performance.
In fact, I think it’s pretty well understood that clean code is a tradeoff wrt performance, at least that’s the way I’ve always understood it.
Clean code works well for something like a web app that’ll need to be maintained by scores of different engineers over many years or decades.
At least that’s the theory. In practice, at least some level of abstraction makes it a bit easier to rip and replace parts of the app without a total rewrite.
While I agree with Casey, for some situations it's hard to do. You can't really develop web app in C# and Java in a simple way. Not only you have to fight the language design, but all framework and libraries are written with OOP, clean code and design patterns in mind.
So you might write your code in a straight way, careful to not lose efficiency and then you are going to call slow code.
So his advices are easier to follow on some scenarios than others.
There is speed and there's the perception of speed. Some code (games) has to run fast. But most of the code we work on has to only have the perception of speed. If you're loading all of your resources and making the user stare at a twirly, you're doing it wrong.
What if you have a growing team of N people, and they need to be able to add shape subclasses / new shape logic all the time? As with anything this is a tradeoff. Often having decoupled code is more important than raw performance.
There is. The first one is that the code with the switch pattern can only process simple shapes. Let's say you want to get the area of a donut now. Well, you need to change the whole code to compute the area of the union. Or imagine that there are other places in the code that need to know the area of a single shape. Do they need to copy/paste the same code?
dude then why does most modern software feel like shit even on high performance systems?
Well written video games, the kind of thing Casey works on usually, beat the hell out of basically any other category of software in terms of user responsiveness. At least of all the software I use regularly.
Yeah, the article might be technically correct but is ultimately pointless. In almost every software engineering environment the priority is always going to be writing readable and composable code over something that runs 5 microseconds faster. All of your clever efficiency gains are anyways going to be wiped out by a single database call.
Not just general advice, but advice that is meant to be applied to TDD specifically. The principals of clean code are meant to help with certain challenges that arise out of TDD. It is often going to seem strange and out of place if you have rejected TDD. Remember, clean code comes from Robert C. Martin, who is a member of the XP/Agile gang.
The author cherry-picking one small piece out of what Uncle Bob promotes and criticizing it in isolation without even mentioning why it is suggested with respect to the bigger picture seemed disingenuous. It does highlight that testable code may not be the most performant code, but was anyone believing otherwise? There are always tradeoffs to be made.
Is testable code a tradeoff too much? That cannot be answered globally, only locally. That's why we have engineers, after all. You could eliminate the entire profession of engineering if "which tradeoff is best" could be answered broadly without deep understanding of specific circumstances.
This thread is, predictably, another demonstration of conflating optimisation with being aware of performance.
The presented transformation of code away from “clean” code had nothing to do with optimisation. In fact, it made the code more readable IMO. Then it demonstrated that most of those “clean” code commandments are detrimental to performance. So obviously when people saw the word “performance”, they immediately jumped “omg, you’re optimising, stop immediately!”
Another irritating reaction here is the straw man of optimising every last instruction: the course so far has been about demonstrating how much performance there even is on the table with reasonable code, to build up an intuition about what orders of magnitude are even possible. Casey repeated several times that what level of performance is right for your situation will depend on you and your situation. But you should be aware of what’s possible, about the multipliers you get from all those decisions.
And of course people bring up profilers: no profiler will tell you whether a function is optimal or not — only what portion of runtime is spent where. And if all your life you’ve been programming in Python, then your intuition about performance often is on the level of “well I guess in C it could be 5-10 times faster; I’ll focus on something else”, which always comes up in response to complaints about Python. Not even close.
Agreed, but most people here haven't paid for the rest of the course, and Casey sometimes forgot that here he's addressing a wider audience.
For instance he fails to explain why he didn't bother addressing the tail of his unrolled loop. He does in the course, but here he's just assuming it's irrelevant, and doesn't address again potential criticism like "he doesn't even bother to write correct code, look at that lazy unrolling!".
Despite being based on science, software development is full of bs, opinions and lacks measurable metrics. How "clean" this code is? In numbers between 0 and 1 for example. What's the metric there? Does using polymorphism automatically make your code clean? We cannot reliably reason about even single responsibility b/c there's no real measure how to count responsibilities of a function. In such la-la-land some desperately seek any formal measure, and there's one - performance - you can actually put a number on a piece of code. But you cannot do this for code "cleanliness". So some people prefer to ignore unmensurable trait and replace it with measurable
Just because something is hard to measure, doesn't mean it's not important to do - I think there's even a fallacy for that?
I think it's also important to separate computer science from software engineering. Algorithms and data structures can be reasoned about mathematically, but what about agile VS waterfall or functional VS oop?
Maybe software engineering relates more to philosophy than math. There are theories and some are more sound than others, but there is no objective truth. Most of us agree that it's important for code to be readable, but we still don't have the answer for what's the best readable code. Even first principle such as DRY are challenged, eg by WET.
I recall vaguely that there was a situation where there were two schools of thought with different approaches. One was about hacking away things, the other about having correct programs. Maybe Berkeley vs MIT? Such different opinions at the basic level of a discipline are more likely in philosophy than math, I think.
>what level of performance is right for your situation will depend on you and your situation
A lot of my performance and code quality chops came from projects where the team was comfortable with the performance of the system but the business was not. They wanted to stop when it was right for them but not right for the situation. It ended up negatively affecting my opinion of them because ultimately I started to see it as deflecting. It's fine because I don't know what else to do, not because this is the best that can be done.
So he puts polymorphic function calls into enormous loops to simulate a heavy load with a huge amount of data to conclude "we have 20x loss in performance everywhere"? He is either a huge troll or he has a typical fallacy of premature optimization: if we would call this virtual method 1 billion times we will lose hours per day, but if we optimize it will take less than a second! The real situation: a virtual method is called only a few hundred times and is barely visible in profiling tools.
No one is working with a huge amount of data in big loops using virtual methods to take every element out of a huge dataset like he is showing. That's a false pre-position he is trying to debunk. Polymorphic classes/structs are used to represent some business logic of applications or structured data with a few hundred such objects that keep some states and a small amount of other data so they are never involved in intensive computations as he shows. In real projects, such "horrible" polymorphic calls never pop up under profiling and usually occupy a fraction of a percent overall.
> The real situation: a virtual method is called only a few hundred times and is barely visible in profiling tools.
The reality is that the entire Java ecosystem revolves around call stacks hundreds of calls deep where most (if not all) of those are virtual calls through an interface.
Even in web server scenarios where the user might be "5 milliseconds away", I've seen these overheads add up to the point where it is noticeable.
ASP.NET Core for example has been optimised recently to go the opposite route of not using complex nested call paths in the core of the system and has seen dramatic speedups.
For crying out loud, I've seen Java web servers requiring 100% CPU time across 16 cores for half an hour to start up! HALF AN HOUR!
I bet that half an hour startup is not because of nested calls at all. I worked with a ton of Java code and if something is slow, it is usually shitty I/O related code or some algorithmic stupidity, not because of virtual calls and what not.
> So he puts polymorphic function calls into enormous loops to simulate a heavy load with a huge amount of data to conclude "we have 20x loss in performance everywhere"?
You're mistaken, the load size has nothing to do with the end result. The result is normalized to give an estimate of how much faster the simple code is than the polymorphic code irregardless of input size. (Kinda like deaths per 100k instead of giving an absolute number of deaths for statistics about diseases).
So yes, your code is running 20x slower than it should be all the time.
Especially when you make every class an interface, with... get this, one implementation! This is based on real world experience and is not a joke. There are real companies with real people that write real code where every single class is an interface with exactly one implementation. Which, as Casey has shown, results in upwards of a 20x slowdown in the worst case.
Obviously, you probably won't get a 20x speedup by getting rid of the polymorphic garbage. But it's equally asinine to assume that polymorphic functions are only called a few hundred times. I guarantee you your PC is making millions of polymorphic function calls per minute between: the OS, the browser, windows Anti-Malware scanner, steam running in the background, oracle running its checks to remind you to update Java, etc. There are hundreds of processes running all the time on a modern device, these devices are wasting enormous amounts of resources.
> Especially when you make every class an interface, with... get this, one implementation! This is based on real world experience and is not a joke.
And when you run this through a profiler, you will not notice how slow your code is, because everything is slow. Slowness is infused throughout the whole system.
Just because you haven't been exposed this issue doesn't mean it doesn't exist. "the real situation", "no one", "in real projects", "never pop up"...give me a break lol.
One can reasonably well guess/know the expected input sizes to their programs. You ain’t (hopefully) loading your whole database into memory, and unless you are writing a simulation/game engine or another specialized application, your application is unlikely to have a single scorching hot loop, that’s just not how most programs look like. If it is, then you should design for it, which may even mean changing programming languages for that part (e.g. for video codecs not even C et al. cut it, you have to do assembly), but more likely you just use a bit less ergonomic primitive of your language.
> unless you are writing a simulation/game engine or another specialized application, your application is unlikely to have a single scorching hot loop
If everything was built with constraints like "this must serve user input so quickly that they can't perceive a delay", we would probably be a lot better off across the entire board.
We should try to steal more ideas from different domains instead of treating them like entirely isolated universes.
> If everything was built with constraints like "this must serve user input so quickly that they can't perceive a delay", we would probably be a lot better off across the entire board.
Sure but it’s a question of tradeoffs.
If the wall-clock-optimized version creates a bus count of 1 for that domain or is difficult for a dozen engineers to iterate on, then that could be worse for the business and users.
Should we want better software? Yes. Should we learn from other domains? Absolutely. But we should ultimately optimize for the domain we’re in, and writing much product-focused software is ultimately best optimized for engineering team and product velocity.
Another example: would most software benefit from formal verification? Yeah, I guess, but the software holistically—as a thing used by humans to solve problems—might benefit considerably more from Ruby.
Product velocity is not increased by spending a ton of time on complex type hierarchies and premature extensibility. It’s increased by having fewer classes and simpler functions that are faster to write tests for to get good coverage via end-to-end tests. The patterns of OOP increase the amount of time spent not solving the business needs. They also infuse the entire system with unnecessary slowness.
> You ain’t (hopefully) loading your whole database into memory
I've basically built my career for the past decade by pointing out "yes, we can load our whole working set into memory" for the vast majority of problems. This is especially true if you have so little data you think you don't have CPU problems either.
Databases are often not used by a single entity, so while I am very interested in your experiences, I think it is a great specialization for certain problems, but is not a general solution to everything.
All in all, I fail to see how it disagrees with my points.
> e.g. for video codecs not even C et al. cut it, you have to do assembly
This is largely inaccurate. Video encoders/decoders are typically written in C, with some use of compiler intrinsics or short inline assembly fragments for particularly "hot" functions.
Exactly, and they only decided to write those bits in assembly when they identified it as a hot code path AND the assembly outperformed any C code they could come up with.
The times that assembly outperforms a higher level language has reduced as well over time, with compiler and CPU improvements over time.
I was talking about those hot functions only, not the rest of the program. But yeah a “sometimes” or a “may” would have helped in my original sentence.
But the point isn't that just the scorching hot loop is 20x slower, but that this penalty is paid everywhere. And it won't show up in profiling since there isn't a hotspot, it's death by 100 cuts.
Never look at the dispatch of a big company’s Java code base then. It’s dynamic dispatch 400 layers deep for a single network call or file op or small amount of math. Sure those are more expensive operations, but the dynamic dispatch has continually out scaled the problem.
But the company that uses Java quite often is writing the umpteenth line-of-business application, where dynamic dispatch is not going to cause a massive overhead
Occasional CPU architect here .... probably the worst thing you can do in your code is to load something from memory (the core of method dispatch) and then jump to it, it sort of breaks many of the things we do to optimise our hardware - it causes CPU stalls, branch prediction failures, etc etc
There is one thing worse you can do (and I caught a C++ compiler doing it when we were profiling code while building an x86 clone years ago) instead of loading the address and jumping to it push the address then return to it, that not only breaks pipelines but also return stack optimisations
FWIW i wasn't trying to make an optimizing compiler, i was experimenting with replacing an interpreter for a scripting language with a JIT, so even bad native code was still faster than the interpreter :-).
It wasn't really used anywhere, eventually i decided to keep the interpreter and move any complex logic in C which ultimately was the simpler approach (and which has been my take on scripting languages for years now: use scripting languages for the "what" and native code for the "how").
Yeah. I opened discord earlier, and it took about 10 seconds to open. My CPU is an apple M1, running about 3ghz per core. Assuming its single threaded (it wasn't), discord is taking about 30 billion cycles to open. (Or around 50 network round-trips at a 200ms ping).
Or as Casey would put it: Discord is taking 3.7moo ("Moon Unit") to open. A Moon Unit is equal to ~2.7 seconds, the maximum ping time to the moon. Therefore, if Discord had their servers on the moon, nobody would know the difference.
Exactly. The webpage is probably asking for resource from 10 different servers and one of them is a bit slower than the others, and the page rendering itself likely doesn't take very long.
No; I have no idea why its so slow. Its kind of hard to tell - I guess I could use wireshark to trace the packets. But who cares? At least one of these things is true:
- It makes horribly inefficient use of my CPU
- It needs an obscene number of network round-trips to load
- One of the network servers that discord needs to open takes seconds to respond to requests
This isn't a new problem. Discord always takes about 10 seconds to open on my computer. (Am I just on too many servers?)
It should open instantly. Everything on modern computers should happen basically instantly. The only reason most software runs slowly is because the developers involved don't care enough to make it run fast.
Except for a few exceptions like AI, scientific computing, 3d modelling and video editing, modern computers are fast enough for everything we want to do with them. Software seems to have higher requirements each year simply because the developers get faster computers each year and spend less effort keeping their software tight and lean.
> The only reason most software runs slowly is because the developers involved don't care enough to make it run fast.
There is truth to that, but also:
* some of them would care if they knew what was possible with reasonnable effort (that's what Casey is trying to address. So far in the course i'm not really seing much that I could apply to the kind of code I write, sadly - but I'm hoping to learn stuff.)
* it's very likely that making performance-aware or optimized code takes just a tad longer than not doing it, and time-to-ship is valued much higher than time-to-run in most industries (this is the point I think Casey is overlooking, or at least not addressing enough. I don't know if it's by design - maybe he disagrees with the trade-off entirely - or if he's biased towards one of the few industries where time-to-run is crucial.)
Right; most teams optimize for velocity before performance.
This makes sense when you're a shiny new startup. But seriously, 10 seconds for discord to open? There's a point in every product's lifecycle where performance is a feature. Discord isn't a startup anymore. Why can't they fix these performance problems? At least discord is pretty snappy once its loaded. The new reddit interface? Its a hog. But despite a massive outcry, why haven't they fixed it?
My pet theory is that they don't know how. And talking about velocity is just a smoke screen.
I think most professional engineers don't really understand the software stack well enough to be able to improve the performance of the software they write. Its pretty understandable - nobody asks about this stuff in job interviews. And the software stack only gets more complicated each year. If you follow React tutorials online, you can get pretty far adding features to a web app without ever needing to understand how react actually works. Or the web browser, and Vite / webpack / whatever and the operating system it runs on top of.
And thats a pretty good deal! More engineers! So long as we don't mind the new reddit site. And electron apps that take seconds to load.
Of course Casey Muratori knows how to write performant code. He understands the whole stack. He knows how to read the assembly that the C++ compiler produces. Thats something more of us should aspire towards.
I wonder if it would be valuable to make an online course talking about performance engineering. I feel like its one of those things that has fallen by the wayside, and I think thats a massive pity.
Which is precisely the point made couple comments up. Calling a lot of virtual methods in the critical path is peanuts compared to making a lot of network requests in said critical path.
But hey, those network calls are fast on my loopback interface, or my company LAN, when I'm playing with the dev version, using test set simulating 2 users and 5 posts for each. Surely it'll be just as fast for the real users, over the Internet, on channels with 1000 users and 5 posts per second.
It's the same problem. Virtual calls are degenerate, in-process RPCs. Or put another way, the reason you make tons of RPCs is the same reason you make tons of virtual calls: you consider services or subclasses to be cheap, so you use them a lot to mold your systems to organizational/people problems instead of the thing the software is supposed to do.
IMO the main difference is that for ~98% of people writing code, subclasses actually are very cheap. The performance losses (11 cycles per iteration?) aren't enough to dissuade me from organizing my code cleanly.
It's more than performance. Some of the worst code I've worked on has had too many layers of sub-classes, making it difficult to navigate and a real loss to developer productivity. After a certain point, it becomes OO spaghetti or, more accurately, "lasagna." At more than 3 layers, you really need to stop and think if it's necessary.
As in like a micro service? Ahahaha. Our CTO just pushed for microservices everywhere and we're not even that far along and we're chasing all kinds of performance problems. Insanity.
Just a quick nudge back on this: people in DSP would disagree with the assertion that nobody is going over big loops and using a virtual method on each element. We often have to process at least 88k elements per second in real time, through many many different processes. If any of those processes are defined using factories that spit out classes with polymorphic inheritance and virtual functions it certainly becomes an issue.
As a result some styles of writing code just don’t work for the audio thread at all, and we’d have to simply avoid or rewrite libraries written this way.
There are just some domains where standard practice for cleanliness is different because of your constraints.
I mean, it’s to the point we’ve got die hards in this industry who insist on putting all functions inlined in headers (not that I agree!)
I agree with your sentiment. But those things exist (not that that validates the authors argument) and I still shake in terror when during covid I was asked to take a look at a virus spread simulation (cellular automaton) that was written by a university professor and his postdoc team for software engineering at a large university that modeled evey cell in a 100k x 100k grid as a class which used virtual methods for every computation between cells. Rewrote that in Cuda and normal buffers/ arrays.. and an epoch ran in milliseconds instead of hours.
In all fairness to them, "simulating many stuff interacting with each other" is the poster child of OO. It's just, that, well, it's not how CPU works.
Then again, at some point we had "Lisp machines", maybe some day there will be a computer architecture where memory / computations patterns are adapted to massive simulation - rather than shoehorning on existing architecture.
And those will fail just as miserably as Lisp machines.
It should be noted that there were AAA games written in this fashion, and they were not slow. All method dispatch was virtual in UnrealScript, for example.
Well, for starters, many AAA games in Unreal had to have many core functions/classes rewritten from UnrealScript to C++ for performance reasons, where often not every call is virtual. Secondly, UnrealScript is not really a great example, since on-top of Unreal being notoriously on the slower-end of game architectures, and even Epic decided to drop UnrealScript.
And importantly, UnrealScript was designed in the 90's, when memory latencies were far less of problem.
Of course C++ is faster, although that has more to do with being compiled rather than bytecode-interpreted. But even so, we played those games on hardware that's very slow by modern standards, and it was fast enough for competitive PvP, so I wouldn't describe it as "slow" in absolute terms.
The same problems happen when you have 1000 requests being dealt with simultaneously each working on small collections. Web Servers for real businesses do not sit idle, they churn at high % and reducing CPU load on them lets you save money, and/or improve latency for users, which can make you money.
So go on all of you, write everything in Python with 90 levels of indirection, my stock will go up.
Reminds me of a joke where programmer optimized most frequently used method in imgur clone from 1s to 0.01s, because customer complained UI was slow to respond.
Congratulations. Taps on the back, champagne all around. Customers call. Same complaint.
Programmer asks "Well, did something change at least?". "Loading bar now flickers more", answers customer.
There is no doubting Casey's chops when he talks about performance, but as someone who has spent many hours watching (and enjoying!) his videos, as he stares puzzled at compiler errors, scrolls up and down endlessly at code he no longer remembers writing, and then - when it finally does compile - immediately has to dig into the debugger to work out something else that's gone wrong, I suspect the real answer to programmer happiness is somewhere in the middle.
When working with a larger code base, there will always be parts that you don't remember writing and you'll inevitably have to read the code to understand it. That's just part of the job/task, regardless of the style it's written in.
In shared code particularly with a culture of refactoring, there's no guarantee that the function call you see is doing what you remember it doing a year ago.
When I was coming up I got gifted a bunch of modules at several jobs because the original writer couldn't be arsed to keep up with the many incremental changes I'd been making. They had a mentality that code was meant to be memorized instead of explored, and I was just beginning to understand code exploration from a writer's perspective. So they were both on the wrong side of history and the wrong side of me. Fuck it, if you want it so much, kid, it's yours now. Good luck.
"Make the change easy, then make the easy change" hadn't even been coined as a phrase yet when I discovered the utility of that behavior. When I read 'Refactoring (Fowler)' it was more like psychotherapy than a roadmap to better software. "So that's why I am like this."
When we get unstuck on a problem it's usually due to finding a new perspective. Sometimes those come as epiphanies, but while miracles do happen, planning on them leads to disappointment. Sometimes you just have to do the work. Finding new perspectives 'the hard way' involves looking at the problem from different angles, and if explaining it to someone else doesn't work, then often enough just organizing a block of code will help you stumble on that new perspective. And if that also fails, at least the code is in better shape now.
Not long after I figured out how to articulate that, my writer friend figured out the same thing about creative writing, so I took it as a sign I was on the right track.
I do know that the first time I was doing that, it was for performance reasons. I was on a project that was so slow you could see the pixels painting. My first month on that project I was doing optimizations by saying "1 Mississippi" out loud. The second month I used a timer app. I was three months in before I even needed to print(end - start).
> there's no guarantee that the function call you see is doing what you remember it doing a year ago.
TDD provides those guarantees. If someone changes the behaviour of the function you will soon know about it.
That's significant because Robert 'Clean' Martin sells clean code as a solution to some of the problems that TDD creates. If you reject TDD, clean code has no relevance to your codebase. As Casey does not seem to practice TDD, it is not clear why he though clean code would apply to his work?
It doesn't. TDD is about writing new code. It doesn't say anything about existing tests being sacrosanct, or pinning tests sticking around forever. I can extract code from a function and write tests for it. I probably know that there's still code that checks for user names but I can't guarantee that this code is being called from function X anymore, or whether it's before or after calling function Y. Those are the sorts of things people try to memorize about code. "What are the knock-on effects of this function" doesn't always survive refactoring. Particularly when the refactoring is because we have a new customer who doesn't want Y to happen at all. So now X->Y is no longer an invariant of the system.
TDD is about documenting behaviour. Which is why it was later given the name Behaviour Driven Development (BDD), to dispel the myths that it is about testing. It is true that you need to document behaviour before writing code, else how would you know what to write? Even outside of TDD you need to document the behaviour some way before you can know what needs to be written.
A function's behaviour should have no reason to change after its behaviour is documented. You can change the implementation beneath to your hearts content, but the behaviour should be static. If someone attempts to change the behaviour, you are going to know about it. If you are not alerted to this, your infrastructure needs improvement.
> A function's behaviour should have no reason to change after its behaviour is documented.
That's only true with spherical cows. That something happens is a requirement. When it happens is often only as specific as 'before' or 'after' but tests often dictate that they happen 'between', which is not an actual requirement, it's an accident of implementation. It was 'easy' to put it here.
Nowhere is it written that behavior in a system is strictly additive.
Systems are full of XY problems. When you recognize that, and start addressing that problem, you sprout a lot of tests for the Y solution and block delete tests for the X solution. That behavior doesn't exist in the system anymore because it's answering the wrong question. Functional parity tests can be copied, or written in parallel. But the old tests disappear with the old code (when the feature toggle goes away).
Leaving the code for X around is at best a footgun for new devs, and at worse a sign of hoarding behavior of an intensity that requires therapy.
You're espousing a process whereby you've nailed one foot to the deck, preferring form over function. Whether you believe what you're saying or not I can't say, but it's restrictive and harmful.
> Nowhere is it written that behavior in a system is strictly additive.
For a unit of the same identity to suddenly start doing something different is plain nonsensical, never mind the technical challenges that come with breaking behaviour that should scare anyone away from trying. Logically, a unit is additive until the unit are no longer used, at which point it can be eliminated.
> But the old tests disappear with the old code (when the feature toggle goes away).
Absolutely, but static analysis can easily determine that the tests being removed correspond with units being removed. If (TDD) tests are removed and the unit code isn't, something has gone wrong and your infrastructure should make this known.
Refactors compose. In three months you can completely rearchitect a module without breaking it at any point in the process. That’s the promise of refactoring.
Functions don’t have an identity. There is no such thing. I don’t know who taught you that but they have broken you in the process. Renaming things is a refactoring. We don’t check the entire commit history to make sure that function name has never existed. Only that it hasn’t existed recently. There’s no identity.
One of the reasons to refactor is that the function has been lying about its responsibilities. So you extract steps out of it, create a new call path that fixes the discrepancy, migrate the call sites, delete the incorrect function, and then, if the function name was really good, you might wait a while and rename the new function to the old name. Each step makes sense if you’ve followed the entire process. If you haven’t been following along at all then you have absolutely no idea how things got here until you read the git history thoroughly, which some people can’t do, and others won’t do if they expect the code to be static.
Also, to clarify, I'm talking about cumulative changes. If I'm working with someone on a feature then we both see all of the changes as they occur. If I'm off dealing with some long initiative, I may not look at that code for 3 months and so I miss all of the intermediate states that made perfect sense at the time.
Like visiting a friend who did their own house remodel. Their spouse saw all the steps, all you saw was before and after, and so the fact that the bathroom door is missing is confusing. The bathroom still exists, but now it's the master bath.
You seem to ignore that when the unit changes, the tests do too. If you come back a year later, foo.bar.baz(quux) might have been refactored and lazily so. The tests were also updated and still pass. You may jump into the code only to realize that someone no-op'd everything and never removed call sites. TDD is primarily a design tool, not a lock-into-implementation tool.
Of course any code requires some refresher at time, but the difficulty and time required to figure it out again is a spectrum that goes all the way down to the seventh circle of hell.
> There is no doubting Casey's chops when he talks about performance,
Remember that everyone has their blind spots!
I follow Casey on twitter, and a couple years ago there was a weird thread where he had hung his browser for 4-5 seconds by running some JS to assign CSS rules to ~50K div elements. And Casey was a million percent confident that the hang was due to JS being slow, and had nothing to do with CSS or DOM rendering.
If you're talking about Handmade Hero, the real answer to programmer happiness is not using a language you despise and refusing to leverage the features of, not refusing to use libraries in that language or frameworks, not re-implementing everything from first principles, and to actually have your game designed first (not designing while you code.)
Casey is a bad example of a game designer and he'll be the first to admit it. However, it is worth noting that Jonathan Blow very much does design while he codes and recommends the practice. He also generally abstains from library dependencies and implements a lot of thing himself.
Of course, part of the point of Handmade Hero is to show that you can totally reimplement everything from first principles. Libraries are not magical black boxes, they're code written by human beings like you or me, and you can understand what they're doing.
For instance, he wrote his own PNG decoder[0] live on stream, with hardly any prior knowledge of the spec, even though I'm confident that under normal circumstances he'd just use stb_image. I'm sure he did this just to show how you'd go about doing that sort of thing.
[0] He only implemented the parts necessary to load a non-progressive 24bit color image, but that still involved writing his own DEFLATE implementation.
Is Blow even a good example to look at? He's released 2 games in 18 years which definitely had phenomenal game play but are not technically complex even for the the time.
Yes, he is a good example. His games are as highly regarded as they are because he takes the time to really work through everything about them and ensure they're the product he wants. If you watch his streams you'll see that he is constantly experimenting with things, both gameplay-wise and in the engine... and now in his own compiler for his own programming language.
Jon is not making the by-the-numbers annual entry in the Call of FIFA series here.
But also, as a nit pick, Jon isn't just programming all the time, he's running a business and he is very involved in the indie game community and a founding member of indie fund. And when he is programming it isn't always for his own games. Here is a link to the credits page for him at MobyGames:
Look I'm a huge fan of his games! And I really can't understate how influential his game design nor the work he does for the indie community. However, I think a lot of folks tend to assume his software engineering skills are great because his game design is excellent. I don't think that's an earned position. Releasing 2 games in 18 years that are not pushing the technical envelope does not scream software engineering expertise. You say he has more credits but only one of those is programming since Braid. On the other hand, on a software engineering level those paint by numbers Modern Warfare and FIFA games are both more technically impressive and are designed for fast iteration.
Moreover he's pushing a particular paradigm and view point that is in many ways the opposite of clean code. He pushes for still doing very low level design with minimal abstractions. But even at the time Braid could have been written in python SDL wrappers and probably had similar performance, and the witness could have used unity. If clean code is about maintenance and time to market, the Blow paradigm hasn't proven that its needed or fixes the holes in clean code. This is not to say clean code is perfect just that Blow hasn't cracked the nut either and I don't know why people act like he is the final word, or honestly even a respected voice, in game software engineering. On the other hand, if Blow wanted to talk about managing indie studios or game design my ears would prick up instantly.
> You say he has more credits but only one of those is programming since Braid.
He did start a company after that you know. A successful one that makes money and employs people to make art. I don't imagine that running a business takes no time from his life.
> But even at the time Braid could have been written in python SDL wrappers and probably had similar performance
Braid did a lot more than you give it credit for. Here's a GDC talk about the rewind system in which he explains some of the hurdles he had to deal with: https://www.youtube.com/watch?v=8dinUbg2h70&t=5s Pay particular attention to the discussion of the background particles and how to get that to work within the RAM constraints.
> On the other hand, on a software engineering level those paint by numbers Modern Warfare and FIFA games are both more technically impressive and are designed for fast iteration
Hardly anything about these games change from release to release. They're not exploring new gameplay problem spaces, they're not doing anything super interesting or surprising on a technical level either and I don't get why you think they are. Of course if you keep using essentially the same engine and know exactly what you're trying to make, making another like a goddamned factory is going to happen quicker than if you're trying to make something unique and meaningful.
> If clean code is about maintenance and time to market, the Blow paradigm hasn't proven that its needed or fixes the holes in clean code.
I have heard "maintenance and scaling" as an excuse for poorly performing software for a long time now, yet what I'm not seeing is software that has features added on quick schedules and without bugs. So at best I'd say that it isn't accomplishing what it is supposed to and, at the same time, it is wasting our time and resources by producing slow software to boot.
The problem is people get hooked into Casey's misanthropic philosophies and take his rants as gospel, and wind up believing Handmade Hero represents the way game development should be done, that libraries and frameworks can't be trusted, that C++ shouldn't be used at all, and you should implement as much from scratch as possible, when none of that is true.
But the results of living in a contrived environment is that there are no guarantees that at the end you will understand 'normal' environments better or just be over-trained on your fictitious one.
I've worked in projects where no one seemed to know SQL, where massive speed improvements were made by fixing very low hanging fruits like removing select * queries, adding naive indexes, removing N+1 queries etc.
Likewise, I've worked in code bases where performance had been dreadful, yet there were no obvious bottlenecks. Little by little, replacing iterators with loops, objects/closures with enum-backed structs/tables, early exits and so on accumulating to the point where speed ups ranged from 2X to 50X without changing algorithms (outside of fixing basic mistakes like not pre allocating vectors).
Always fun to see these videos. I highly recommend his `Performance Aware Programming` course linked in the description. It's concise and to the point, which is a nice break from his more casual videos/streams which tend to be long-winded/ranty.
Just taking the little bit of time to think about what the computer needs to do and making a reasonable effort to not do unnecessary stuff goes a long way. That 2x-50x factor is in fact very familiar. That’s something loading in a second rather than in a minute, or something feeling snappy instead of slightly laggy.
And it matters much more than people say it does. The “premature optimisation...” quote has been grossly misused to a degree that it’s almost comical. It’s not a good excuse for being careless.
One thing I find frustrating in game development, is we often put off optimization until the end of the project when we know we are happy with the game. But that means _I_ have to live with a slow code base for 2 years.
It takes 19 seconds for the main menu to load when you push play on our current game in Unity. It's killing me.
Meanwhile in my lua side project, its less than a second.
> I've worked in projects where no one seemed to know SQL, where massive speed improvements were made by fixing very low hanging fruits like removing select * queries, adding naive indexes, removing N+1 queries etc.
Can you recommend any SQL book with main focus on performance improvements like this?
I've learnt as I've needed it, so I'm afraid I don't have a single source to point at. Of the things that I've listed, a quick google search on each should give you enough info to be useful.
> Select * queries
Sometimes you only want two columns, but you ask for 5. Say you query a million rows, where you ask for but throw away 60% of all the data you get back.
> Naive indexes
As in, just slapping an index on a table that doesn't have one makes such a big difference that sometimes it's all you need.
> N+1 queries
This is more of a problem in ORMs, but any time you call the database N times instead of 1 time. A classic example is writing a for loop that asks for one row at a time, instead of asking for all rows once.
I think historically people had a lot of bad intuitions about how effective non-composite indexes are in databases. That if I have an index for A and an index for B I should be able to do A & B and get a quick answer.
There's been a lot more educational material on composite indexes and in particular partial indexes and so I'm not sure if someone with 3 years' experience today can accurately judge a conversation talking about ten years ago.
If some field is referenced in WHERE clause, add an index for it.
If there are a few fields referenced in a single WHERE add a single index that includes all of them.
If you have index that has a, b, c then it is as if you also had indexes a, b and a.
If condition in WHERE is = put this field at the beginning of an index. If it's < (or similar) put it at the end. You'll get best results if you have none or only one < in your query.
This guy is so dogmatic about it it hurts. I would argue that clean code is a spectrum from how flexible vs how rigid you want your abstractions to be. If your abstractions are too flexible for good performance, dial them back when you see the issue. If your abstractions are too rigid for your software to be extendable, then introduce indirection.
We can all write code that glues a very fixed set of things end to end and squeeze every last CPU cycle of performance out of it, but as we all know, software requirements change, and things like polymorphism allow for much better composition of functionality.
Casey is a bit of a hardcore crusader on the topic, but I'd hardly call dogmatic someone who can provide you evidence and measurements backing their thesis.
The tests he put together here are hardly something I'd call a straw-man argument, they seem like reasonable simplification of real-cases.
Evidence in a micro benchmark of a single page of code.
The focus on performance here ignores the fact that most programs are large systems of many things that interact with each other. That is where good design and abstractions and “clean code” can really help.
Like all things it is about finding a balance and applying the right techniques to the right parts of a larger system.
True, the example is simplistic, but that's just to make it fit within reasonable exposition.
The author has in the past shown (elsewhere) how his techniques can actually make dramatic differences in more concrete examples (he rose to some Internet fame for building a performant shell that could actually handle larger outputs orders of magnitude better than most available alternatives).
To me the interesting point is the reminder that there is an innate tension between going fast and being "clean" (i.e. maintanable/understandable). And once you are aware of it you can make your decisions in an informed way. Too often this tension is forgotten/ignored/dogmatically put to the back ("performance doesn't matter over cleanliness" and the likes).
Mind you, I'm also of the camp that performance is very secondary to cleanliness in modern enterprises, but I appreciate a reminder of just how much we are sacrificing on this altar.
It was Windows Terminal he wrote something faster than. Windows terminal was doing a whole GPU draw call per character (at 2x standard terminal height, 160x25, it was as many draw calls as a AAA game).
Which is an excellent argument for why the WT team should have done some benchmarking, identified that they have a really dumb O(M*N) bottleneck in a critical part of their application (their rendering code), and optimized it.
It is not an excellent argument for why 'clean code' is not a better fit for the other 99% of their codebase.
I'd say that it really depends on what the code needs to do.
For example, if the code is to be distributed and extended as a third-party library, then the class hierarchy is probably a better fit to allow extensibility.
But if the purpose of the API is to compute the area of given shapes (as in the example), then it makes sense to make it efficient and there is no use to provide extensibility to the outside world.
The advocates of "clean code" that Casey mentions will go for extensibility no matter the use.
I find it very interesting to have numbers to weight what you are leaving when you go the extensibility route instead of the non-pessimistic route.
I don't know.
Before reading Casey, I'd say most, if not all, of my APIs would look like the "clean code" style.
The process was "think about the model, design a class hierarchy that fits with the model, add the operations". Even if I don't think about extensibility, that's how I thought my code.
Why? Because when I think about performance, I used to think about algorithms and I/O optimizations (for example batching).
Now I'll look into this data-oriented programming-thing and see how that applies to my platform (embedded Java) :D
I've seen at least one embodiment of the stereotype 3 months ago. It wasn't pretty, I was able to shrink his code by a factor of 5, and make it more flexible in the process. Among other mistakes he had some idea of flexibility, put part of the infrastructure in place, and then utterly failed to use it, such that when it came to test he required a monkey patching framework for C (Ceedling) so he could mock what he almost already mocked in the source code.
That's a fairly rare breed for sure, but the real problem was that nobody called him out. Well I did, but I'm no longer working there. I couldn't.
Performance measurements are only one dimension of code quality. Having a laser focus on it disregards why you would want to sacrifice performance for a different dimension of code quality, such as extensibility for different requirements.
You should check if your code is in the hot path before optimizing, because the more you couple things together the harder it is to change it around. For instance, in Casey's example, if you wanted to add a polygon shape but you've optimized calculating area into multiplying height x width by a coefficient, that requires a significant refactor. If you are sure you don't need polygons, that's a perfectly fine optimization. But if you do, you need to start deoptimizing.
> they seem like reasonable simplification of real-cases.
Paraphrasing Russ Ackoff, doing the right thing and doing a thing right is the difference between wisdom or effectiveness and efficiency. What Casey is doing here may be efficient, but calculating a billion rectangles doesn't present a realistic or general use case.
"Clean Code" or any paradigm of the sort aims to make qualitative, not quantitative improvements to code. What you gain isn't performance but clarity when you build large systems, reduction in errors, reduce complexity, and so on. Nobody denies that you can make a program faster by manually squeezing performance out if it. But that isn't the only thing that matters, even if it's something you can easily benchmark.
Looking at a tiny code example tells you very little about the consequences of programming like this. If we program with disregard for the system in favour of performance of one of its parts, what does that mean three years down the line, in a codebase with millions of lines of code?`That's just one question.
Unfortunately, everything in our profession is a tradeoff. Faster and maintainable are two of the many quality metrics you can optimize for that will be at odds at times. What the right balance is for a given piece of code depends on so much context. It's a hard balance to get right.
These examples are absolutely a strawman. He's imagining there's one specific access pattern that's executed thousands of times per second. In a realistic codebase you're accessing the data less often but in multiple different (often subtly so!) ways. Cache efficiency is everything for modern CPUs, so you can't "simplify" the access patterns without making your benchmarks unrepresentative.
Yes... but I'm also seeing dogmatism from the opposing camps here in the comments section.
The reality is that how flexible your interfaces and abstractions are and their design has to be a part of your original design considerations when building something. It's a bad move to just hand wave away performance concerns because you religiously adhere to some design patterns. It's also a bad move to drop down to using intrinsics for everything from the get-go and thinking you know better than the compiler when it's a codepath that isn't even computationally expensive or a bottleneck a priori.
I think part of the problem with supposed "clean" code is that it tends to be a matter of opinion. Is the polymorphic version cleaner than the switch statement version? I would argue the latter is actually easier to read. There's no real reason to think "clean" code is actually clean other than anecdotes and that someone wrote it in a book, but the performance is something that can be objectively measured.
The "Clean Code" that Casey is talking about is a book and a code philosophy that was explained in depth in talks and trainings and seminars, so I would disagree that it is a matter of opinion.
I think it really truly depends. I think it's always good to do the minimal viable thing first instead of being an architecture astronaut, but if you've been asked for three (random ballpark number) different implementations for the same requirement it might be time to start adding some indirection.
The best idea in clean code is to stop coupling domain models to implementation details like databases/the web/etc. Once you grok that, then you're in a better position to work on eliminating unnecessary coupling within the model itself.
There's lots of ways to do this poorly and well. There's no process for it. That's a feature. I feel like a lot of the flak clean code gets boils down to, "I followed it dogmatically and look what it made me do!" It didn't make you do anything; it's trying to teach you aesthetics, not a process. Internalize the aesthetics and you won't need a rigid process.
Obviously when you do this you probably need more code than you'd normally write. That can be viewed as a maintenance burden in some situations, esp. when you don't have product market fit. Again, this shows that treating clean code like some process that always produces better code in every situation is extremely naive.
There was a moment in grade school where I was sat down and it was explained to me that you don't have to take a test in order. You can skip around if you want to, and I ran so far with that notion that at 25 I probably should have written a book on how to take tests, while I could still remember most of it.
One of the few other "lightning bolt out of the blue" experiences I can recall was realizing that some code constructs are easier for both the human and the compiler to understand. You can by sympathetic to both instead of compromising. They both have a fundamental problem around how many concepts they can juggle at the exact same time. For any given commit message or PR you can adjust your reasoning for whichever stick the reviewer has up their butt about justifying code changes.
you have Japan infrastructure, and you have Turkey infrastructure
6.1 quake in Japan = nothing destroyed
6.1 quake in Turkey = everything collapses
The engineers in Turkey probably didn't value performance and efficiency
It's the same for developers, you choose your camp wisely, otherwise people will complain at you if they can no longer bear your choice
You act like innocent, but your code choice translate to a cost (higher server bill for your company, higher energy bill for your customers/users, time wasted for everyone, depleting rare materials at a faster rate, growing tech junk)
Selfishness is high in the software industry
We are lucky it's not the same for the HW industry, but it's getting hard for them to hide your incompetence, as more things now run on a battery, and the battery tech is kinda struggling
Good thing is they get to sell more HW since the CPU is "becoming slower" lol
So we now got smartwatches that one need to recharge every damn day
Yeah, it's a bad and unnecessarily inflammatory (and arguably disrespectful) example. But the rest of the points GP makes are spot on.
A "blame systems over individuals" version would be that the industry is externalizing bad performance onto users, damaging environment, causing frustration, wasting lives, and occasionally even actually killing people (shitty ER/hospital software comes to mind) - because there's no good feedback mechanism to force software companies to internalize those costs.
The shapes example is pretty contrived so I don't really have an opinion on it either way. But imagine you have something like a File interface and you have implementations of it e.g. DiskFile, NetworkFile, etc., and you anticipate other implementors. Why would you do anything other than have a polymorphic interface?
I think the shapes example is more of a dig at various game engines where you end up with a long trees of inheritance (physicsbody -> usercontrollable physics body -> renderable user controllable physics body etc..) as opposed to the recent trend of using something like an Entity Component System.
Also I think he isn't specifically against "clean code" but how the first tool used by various "clean code advocates" seems to be polymorphism via inheritance. I have seen this enough in lots of Java codebases and "Enterprise C++ codebases". "We need X." "Oh first I will create an Abstract Base class for X, then create X, so we can reuse X nicely elsewhere when we need it". It is still on the developers to understand that they may not even need it but for them it is "clean code".
I wish he would've gone a bit more into why people reach for polymorphism instead of a struct with a variant enum: because it's how OOP is taught, and how it easily maps to human understanding of most problem spaces. like you're making an RPG, so you make an Item class, and then you make Armor and Weapon subclasses, and then you make Helmet/Chestplate/Leggings/Shoes and Sword/Polearm/Axe/Mace/Staff subclasses, etc. etc., just because it fits your mental model of the problem, when in actuality, you could've totally avoided all of that complexity, and the ensuing boilerplate, just by sticking it all in one struct and calling it a day. this is not always the solution, but more often than not it is.
That does depend on how the abstraction is defined, of course. I once worked on optimising a 2D Canvas C++ class which had a nice top level virtual interface so you could replace it with a different implementation. It was also crazily slow because it defined:
virtual void setPixel(int x, int y, int color) = 0;
and then implemented flood fill etc in terms of that.
This isn't necessarily bad if all the other operations are also virtual. The idea is that you can quickly get something working (e.g. when porting) just by implementing setPixel(), and then gradually fill in other primitives with properly hardware-optimized versions. And you do need a virtual setPixel() in any case because some API client might need that one call.
You can still easily get something working just by implementing setPixel without virtual dispatch, the linker has no problem inlining that call at compile time.
If some arbitrary API needs it to be virtual it's easy to implement the virtual call in just that specific case, instead of burdening your entire system with a virtual call that'll always be static in practice.
For the same reason why you need a virtual drawLine etc. Because some code wants to render something, and doesn't want to care if it's rendering onto an on-screen surface, into a file etc.
Your entire system will not be burdened with virtual calls in places where you use concrete implementations (so long as they're final/sealed, anyway). The overhead is only there if you try to use the abstraction generically, but why would you do that in the case where the virtual call "will always be static in practice"?
I don't know, I've never done anything where code needed to have runtime dispatch on the kind of file it has without also knowing anything about what kind of file it has.
Have you never used C++ iostreams? Or the Python file abstraction? Or Rust std::io::Read/std::io::Write? Or Node.js streams? Or DOM Web Streams? Or Ruby files? Or Haskell conduits? Or hell, fopencookie/funopen in C?
The abstraction is super common and allows you to connect streams to each other without worrying about the underlying mechanism, which 99% of the time I don't really think you want to worry about unless you're sure it's a performance overhead. And that's great, because I surely don't want to write specializations by hand for all the different combinations of streams I need to use if I don't have to.
I use generic readers and writers every day, but they don't have any runtime dispatch. The question was "Why would you do anything except vtables and runtime dispatch?" One answer is that my code that uses generic readers and writers and gets monomorphized gets to also be generic over async-ness.
I don't like most of these "principles", as anyone can verify by looking at my previous comments, but this article is cherry-picking to its utmost level of unfairness.
These "clean code" principles should not, and generally are not, ever used at performance critical code, in particular computer graphics. I've never seen anyone seriously try to write computer graphics while "keeping functions small" and "not mixing levels of abstraction". We can go further: you won't be going anywhere in computer graphics by trying to "write pure functions" or "avoiding side effects".
These "clean code principles" are, however, rather useful for large, corporate systems, with loads of business process rules maintained by different teams of people, working for multiple third parties with poor job retaining. You don't need to think about vector performance for processing credit card payments. You don't need to think about input latency for batch processing data warehouse jobs, but you need this types of applications to work reliably. Way more reliably than a videogame or a streaming service.
Right tools for the right jobs, people need to stop trying to hammer everything into the same tools. This is not only a bad practice in software, it's a bad practice in life, the search for an ever elusive silver bullet, a panacea, a miracle. Just drop it and get real.
> you won't be going anywhere in computer graphics by trying to "write pure functions" or "avoiding side effects"
Not sure about this; in my experience (in a different domain, audio processing) you totally can get away with both of these a lot of the time.
Function inclining works well, so you can write small pure functions in a lot of cases (especially if you accept a function that reads from one buffer and writes to another as pure).
As for avoiding side effects, this is normally more about keeping your state updates small and localised (allowing more parts to be pure), which is often not a problem performance-wise.
IME it's much easier to improve the performance of a piece of code which is easy to reason about and change with some level of confidence that your optimisation will not break things.
I know there's some unavoidable global state in computer graphics, but presumably there is lots of code that doesn't directly touch that.
Exactly! I work with a lot of high-performance code and also a lot of non-high-performance code (think all the plumbing around the core computation) and I definitely use a lot of "clean code patterns" in the non-performance-critical parts. They're the ones that tend to change more, that more people touch, that get done faster... It's just about knowing what to use and when.
> These "clean code" principles should not, and generally are not, ever used at performance critical code, in particular computer graphics.
I agree that this is mostly true, but maybe not for beginners in the field
When I was reading "Raytracing in One Weekend" (known as _the_ introductory literature on the topic), I was very surprised to see that the author designed the code so objects extend a `Hittable` class and the critical ray-intersection function `hit` is dynamically dispatched through the `virtual` keyword and thus suffers a huge performance penalty
This is the hottest code path in the program, and a ray-tracer is certainly performance critical, but the author is instructing students/readers to use this "clean code" principle and it drastically slows down the program.
So I agree most computer graphics programmers aren't writing "clean code", but I think a lot of new programmers are being taught them because of introductory literature
Yeah, but why is good and fast opposite rather than orthogonal? Why languages and compilers are not built so at least we can do both. Or even better, why is good and fast not the same thing in computer languages?
But I'd point that languages and compilers are built so we can have both.
The problem is some definitions of good really aren't, and it isn't just because they're slow, it's because they make some things "good" at the detriment of others. Uncle Bob's Clean Code is really good at making some portions of the code "good" and "simple", but when put together they are interconnected in ways that are more difficult to understand.
Yes and no, I think that what can happen if you only focus on "performance critical code" is the rest of your code is slow and incredibly unfriendly to cache, but there's absolutely nothing you can really do about it past a point. Like, if all your code is 5x slower than it possibly could be, then even after you fix the performance critical bits you have this ugly tax everywhere else that you can't do anything about other than a rewrite. And I do think that matters, look at various projects written in a language like python where bits have been written in C, but you still have all this slow interpreted stuff. (I like Python btw, I'm just making a point that if performance matters it doesn't just matter in loops)
I don’t understand why there is still the false dichotomy between performance and speed of development/readability. Arguments on HN and in other software circles suggest performant code cannot be well organized, and that well organized code cannot be performant. That’s false.
In my experience, writing the code with readability, ease of maintenance, and performance all in mind gets you 90% of each of the benefits you’d have gotten focusing on only one of the above. For instance, maybe instead of pretending that an O(n^2) algorithm is any “cleaner” than an O(n log n) algorithm because it was easier for you to write, maybe just use the better algorithm. Or, instead of pretending Python is more readable or easier to develop in than Rust (assuming developers are skilled in both), just write it in Rust. Or, instead of pretending that you had to write raw assembly to eke out the last drop of performance in your function, maybe target the giant mess elsewhere in your application where 80% of the time is spent.
A lot of the “clean” vs “fast” argument is, as I’ve said above, pretending. People on both sides pretend you cannot have both, ever, when in actuality you can have almost all of what is desired in 95% of cases.
I'd even go so far as to say that "clean" code is a requirement for performance optimization. For a loose definition of clean.
Code that is unreadable, tightly coupled, untestable or just messy is much, much harder to work in than code that is readable, loosely coupled, well-tested and clean. This has been proven often and is really a no-brainer. Performance-optimizing is finding the bottleneck, then rewriting that without changing the functional behaviour. For this you need resp. readability (to find bottlenecks you must be able to understand flow and code), ability to rewrite (tightly coupled code cannot be rewritten in isolation) and insurance the behaviour doesn't change (test coverage).
Ergo: a clean archictecture is a requirement to make code more performant in the first place. Even if that architecture is bad for performance in itself, it enables future improvements.
I find it funny that people keep throwing non-caps "clean" code here in this post without knowing the context, you now have "clean architecture" too, and it made it more difficult to know what you're talking about.
Clean Code is actually a book by Uncle Bob. And Clean Architecture is the name of another book by him.
What Casey is criticizing isn't "good code". He's criticizing Uncle Bob's philosophy.
If you have a large sum of I/O and can see the latency tracked and which parts of the code are problematic, optimize those parts for execution speed.
If you have frequent code changes with an evolving product, and I/O that doesn't raise concerns, then optimize for code cleanliness.
Never reach for a solution before you understand the problem. Once you understand the problem, you won't have to search for a solution; the solution will be right in front of you.
Don't put too much stock in articles or arguments that stress solutions to imaginary problems. They aren't meant to help you. Appreciate any decent take-aways you can, make the most of them, but when it comes to your own implementations, start by understanding your own problems, and not any rules, blog titles, or dogmas you've previously come across.
Let's take your example of a program with a lot of I/O. A straightforward way to optimize that is to find a way to reduce the number of I/O operations you do.
And once you do that, the bottleneck shifts. You're spending less time in I/O, both in an absolute sense and relative sense. So you might run into a new non-I/O new bottleneck that was just drowned out in the noise before. So you optimize that ...
And sometimes this goes on for many iteration cycles and you end up with a 100-1000x performance improvement.
I think this is the point. In the example given, if you introduce a sqrt, then the argument becomes much weaker already. The dispatch is comparable to the computation time. I'm reminded of "Latency Numbers Every Programmer Should Know".
I actually just had this debate with myself, specifically about shape classes including circles and bezier curves. However, the operation was instead intersections. There was zero performance difference in that case after profiling so I kept the OOP so that the code wasn't full of case statements.
This comment section just shows how so many developers are victims of group think. Here is actual evidence that at least hints that the primary paradigm is wrong, and immediately a bunch of nerds jump and attack, instead of taking the criticism in good faith.
Compare this to discussions about FP, new languages like Rust, and so forth. This really demonstrates the primary vogue mindset is increasing complexity and hierarchy to the detriment of all else, and is why the supposed new paradigms of "modern software development" are not really that new but just evolutions of the current paradigms. You really touch what is a culture's sacred cows by that which attracts criticism without any real sincere rebuttal.
I'm not sure why you're bringing up FP and Rust here. The idiomatic FP and Rust versions of the code in the article would be similar to the fast version: you use algebraic data types to represent your Shape type (which are just tagged unions under the hood, exactly like in the article) and pattern match on the type, which can be compiled to a jump table (exactly like in the article).
And I think neither FP nor Rust discourage the internal-representation-dependent 10x optimization. They only discourage doing this across module boundaries, but the style of programming encouraged by FP and Rust encourages putting your datatype variants together in one module (unlike OOP).
So with those languages you're much more likely to naturally arrive at a fast solution than with traditional OOP.
Agree. Typical FP with ADTs would lead to the same kind of data-driven style. The point at which he went too far for me is probably the struct layout optimisation (every shape has two defining numbers, it feels a bit restrictive). For me that’s the point where I’d check whether it’s really warranted and if so create new types to distinguish the original domain (Shape) vs. number crunching (ShapeDefinedByTwoNumbers - needs a better name).
In fact, it perversely proves the point. If 99.9% of apps aren't caring about performance - that's probably why my $2000 computer is slow to open a text file.
I want all of my applications to be well performant. Using the modern web is an atrocious experience most of the time. It's strange when I stumble upon a mostly-unchanged "web 2.0" era site. It loads instantly, like, shockingly fast. It doesn't have all the SPA widgets and "interactivity" but it works, loads extremely fast, and doesn't turn my computer's fans on.
My computer is several orders of magnitude faster than my computer from 2001. Yet the applications I want to use feel slower.
People say that programmer productivity is more important than efficiency, I don't think programmers are more productive. We've just lowered the bar.
If 99% of apps are slower, that's everything I do on a computer! Yes it matters!
It doesn't matter for corporate environments perhaps. It does hell of matter for consumer facing web stuff, both the front end and the backend! 99% is a lot of stuff, so yes it does matter.
And the top reply (right now) is about 99% of the time is spent waiting for user input, and no, that isn't even true. A lot of that "input" is waiting on the network, and the number of requests for any application makes per unit time definitely scales with increasing code complexity.
But anyway, may be that makes a tenuous argument that most code does not care about performance, but again, if it's 99% of code, then yes it matters because it's my entire computer, and that's how we have machines that are faster than they've ever been yet they struggle to edit text compared to say emacs on pentium 4.
90 percent of apps are slow and unreponsive, and harming people by being so: but software developers aren't directly harmed by the effects of their actions, so the situation continues.
>It simply cannot be the case that we're willing to give up a decade or more of hardware performance just to make programmers’ lives a little bit easier. Our job is to write programs that run well on the hardware that we are given. If this is how bad these rules cause software to perform, they simply aren't acceptable.
That is not our job! Our job is to solve business problems within the constraints we are given. No one cares how well it runs on the hardware we're given. They care if it solves the business problem. Look at Bitcoin, it burns hardware time as a proof of work. That solves a business problem.
Some programmers work in industries where performance is key but I'd bet not most.
> CPU cycles are much cheaper than developer wages.
Please just stop with this. It's plainly false.
At $dayjob I recommended some simple database query tuning that 1 developer applied in their spare time. This improved performance from 9 seconds per page to 500 milliseconds per page.
That customer wanted to use auto-scale to expand capacity (nearly 20-fold!) to meet the original requirements, which would have cost about $250K annually.
The dev fixed the issue in like... a week.
What developer costs $250K per week!? None. None do. Not even the top tier at a FAANG.
Not to mention the time saved for the thousands of users that use this web application. Their wasted time costs money too.
In this case you have massive scale so a small amount of developer time can equate to a big reduction in CPU cycles.
Those are the constraints you work with. Those are the business problems you are solving. So yes this was justified. It was justified on the numbers.
My point is that your job is only to tune performance iff there is a solid business case for it.
I spent a week reducing a pages load time because the busines saw the load times as a problem for customer acquisition. The cost of my time was justified. Meanwhile we had a task that for 6 years took over an hour to run. I optimized it to take less than a second. That change has offered zero business value and was a waste of time. It runs periodically on a server that is most often idle. There was no justification for that work, other than it taught me to align my efforts with the business.
CPU cycles are much cheaper than developer hours. But yes, if you have enough of them yes they will cost more than a developer.
It's not a massive scale at all! It's just "Enterprise" edition software (SQL Server, etc...) that costs $$$ to scale up.
The actual production platform is just a dozen or so virtual machines, they're not even that big.
Not everybody has their own data centre where they pay cost-price for all-Linux servers that run only free software.
In a typical business that hosts their servers in the public cloud, it's not unusual for a single database index to allow savings of tens of thousands a year.
E.g.: SQL Enterprise on an Azure E8bds_v5 VM is $2,678 per month, but the next step up to E16bds_v5 is $5,355 per month!
If you can optimise a database so it can move from 16 vCPUs to 8 vCPUs then that alone saves $32K annually. This is ignoring ancillary costs such as the second DR instance, etc...
Sure that makes a lot of sense, I'm sorry for my bad assumptions, but that drives my point home even more. The problem is not related directly to your performance on the hardware. It's an additional cost from the business environment. License costs can drastically alter where your efforts are best spent. I'd imagine there are times where you might be better off pulling huge chunks of data out of the database to process on machines that don't have such a big license cost.
CPU cycles are cheap, SQL Server licenses are extortionately expensive. While the costs are tied to the server they run on if you can offload to a different CPU not tied to that license model you can still take advantage of the low cost of CPU cycles.
But people aren't. Any code that "makes people wait" is wasting people's time. The only way to make code take less time is to optimise it, because capacity != latency. You can't get a 20 THz processor. You can buy more capacity, but you can't buy more speed!
At FAANG scale, it's common to hire "top" developers at hugely expensive annual total comp to tune stdlib code like "string" for just 1-2% efficiency gains because at their scale that might be 1,000 to 10,000 fewer servers.
At a small scale, budgets are tight.
At "enterprise" scale, staffing (user) costs are high, and license costs are high.
I can't think of a typical business scenario where compute and/or associated per-core-licensing costs can be blindly disregarded with a flippant statement like "developers are expensive and infrastructure is cheap".
>I can't think of a typical business scenario where compute and/or associated per-core-licensing costs can be blindly disregarded with a flippant statement like "developers are expensive and infrastructure is cheap".
Say you're a small business with an in house server rack. Not a software company but say a manufacturing business. Not enterprise scale, smaller. You have 1 development resource. It turns out the ETL server is overloaded and it's causing the reports that run on the same server to run slow. You could get the developer to spend a few weeks porting the legacy system to faster modern option to speed up the ETL and maybe improving some of the reports. But it would be far cheaper to buy in another server for $3k and have the developer spend less than half a day moving the ETL onto the fresh server.
On which... it'll run at maybe 20% faster, because that's the scale of single-threaded processor speed improvements these days. Not to mention that now there's a network hop involved, which will eat into any CPU gains.
Very few apps scale well with increasing core counts, and then hit a wall around 64 cores for almost everything.
Okay, okay, fine. The ETL is natively parallel code and somehow, magically, it can read inherently sequential file formats like multi-gigabyte CSV or JSON files in parallel. This tiny org already has 10 gigabit switches, SFPs, and everything.
Did you upgrade the database server too? No? Now the shiny new ETL server is twiddling its thumbs while the database server is getting overloaded.
Suddenly this option is "not so cheap". You have to buy a new database server and... uh-oh... it's Microsoft SQL Server, Oracle, or SAP HANA, and the licensing is going to eat half your tiny little company's profits for the year.
Did you forget the OS license, backup agent license, anti-malware license, and so on? I bet you did. All of those are extra, and either per-machine or per-core.
Someone has to set this all up. Small non-IT shops typically outsource this to an IT service company. They'll explain all the extras that turn a $3K purchase into a $30K purchase (including on-site assistance to install everything).
Or that 1 guy could have just taken a 10 minute look at the ETL logs, discovered that "SELECT * FROM HugeTable" is unnecessary, and fixed the problem.
Don't forget your electricity costs, A/C costs, hardware maintenance costs... these are often overlooked, but are not nothing. You've added permanent recurring expenses to avoid your one-time developer fee.
Yes, sometimes this might make sense --- if the dev fees are going to be exorbitant, or if you just can't afford to pull a dev off a project to work on it. Other times, it makes more sense to pay the dev...
You keep saying, CPU cycles are cheaper than developer hours, but this is nonsensical without quantities attached to each. How many CPU cycles, on what kind of machines? How many cycles do those machines have to spare? What's the performance per watt? How many developer hours at what kind of salary? There's way too much missing info to be making such a statement.
>You keep saying, CPU cycles are cheaper than developer hours, but this is nonsensical without quantities attached to each. How many CPU cycles, on what kind of machines? How many cycles do those machines have to spare? What's the performance per watt? How many developer hours at what kind of salary? There's way too much missing info to be making such a statement.
Yeah that is fair. There is a lot of missing information. I'm only trying to across that now days developer time is often expensive and hardware is often cheap and powerful.
I don't like the idea that our goal is always to make the most efficient code possible. It's not, it's to deliver business solution as efficiently as we can with the resources available. Just like you wouldn't want to pay for a mechanic to spend a week making your car more fuel efficient if you took it in to get new engine mounts. His job is not to make your car work for you, not work the best it possibly could.
That said most of the time you do want to be writing efficient code.
> I don't like the idea that our goal is always to make the most efficient code possible. It's not, it's to deliver business solution as efficiently as we can with the resources available. Just like you wouldn't want to pay for a mechanic to spend a week making your car more fuel efficient if you took it in to get new engine mounts. His job is not to make your car work for you, not work the best it possibly could.
If he could get me a 1000x gain in fuel efficiency, like you often see when software when performance overhauls are done, I would sure as hell give him his week. But this is less about maintenance and more about how the car/software is built the first time around.
In that vein, I do expect that if gasoline prices drop to $0.20c/gallon in a decade (hah), that fuel economy on new cars does not drop to 3mpg to match. That's essentially what seems to have happened in software --- the hardware got really fast, so software got really slow.
> It's not, it's to deliver business solution as efficiently as we can with the resources available.
This is true; I guess what I take issue with is externalizing hidden costs to the customer. We keep paying for faster and faster hardware, and have to because that old hardware which is still working perfectly fine can't run the new software, which is much slower. And often, even on new hardware, the software is just this side of "tolerable". If you're writing your own in-house tool and nobody cares, do whatever suits.
> That said most of the time you do want to be writing efficient code.
Yes. That's all I want. Reasonably efficient. Not balls-to-the-wall speed demon witchery like we saw back in the demoscene heyday, just not to be sliding backwards all the time to erase all the gains our hardware got us.
---
EDIT: I get where you're coming from with the business incentives, I really do. But I'm saying I have a lot of issues with the end result --- as is often the case, maximal profit for the business is wreaking havoc elsewhere, in a sort of tragedy of the commons effect. And there are no realistic ways for me as a consumer to alter business incentives. A lot of software, especially the type that people get paid to write is closed source and closed protocol.
An excellent example is Discord --- it works well enough, when it works, but it's kind of a big heavy behemoth. It doesn't run well on older computers (I frequently see it burning a whole core just sitting in a voice channel). Right now, it's using nearly 1GB(!) of RAM, and frequently climbs to 2 or more if I leave it run long enough. This is a program whose core functionality was essentially available to me in 1999, and it struggles if I try to run it on a 4GHz machine from 2012. The search function sucks imo, and various other complaints. Screensharing with audio is broken (because electron), and probably always will be.
And I can't do a damn thing about it. I can't use a different platform, because the platform I use is determined by the people I want to talk to that are already using it. I can't improve or fork the program, because it's closed source. I can't (realistically) use a different client, or even write my own, because it's a closed protocol. So I'm just stuck with this pig of a program, and no amount of rage or frustration that I feel will alter the company's business incentives.
But it's just one program, right? Okay, I've got the cycles to spare, and the RAM, well, I overprovisioned this machine, so (in the case of this one, relatively modern machine), it's not the end of the world... right?
Now add Spotify. It's the same damn problem, so now the problem is 2x. Add a web browser (I mean, one who's job is actually to browse the web). I manage to draw the line there, mostly, but a lot of people are stuck with a lot more (VS Code, etc). It all adds up to a nightmare. And yet, all the time, I hear how "performance doesn't matter" (not exactly your words, but a prevalent developer sentiment).
>My point is that your job is only to tune performance if there is a solid business case for it.
I think its very very hard to put a cost on performance.
A few seconds here or there is very draining on people. How do you measure if people are avoiding doing things, or putting off work because their tools are janky. How do you measure how much time people spend complaining about how slow their computer is?
If they are a normal business admin they are very replaceable. So you don't have to make life easy for them unless they can justify the cost of doing so.
How busy are they?
If you have a 1 EFT position filled by an administrator that is only 60% busy then there is no cost in slowing them down until they're at 100% capacity. Then you might think about making life easier for them to avoid having to hire the next administrator. If people avoid doing their job because they don't like their tools that is a disciplinary matter. Time to find an administrator that can do the job required.
> If they are a normal business admin they are very replaceable. So you don't have to make life easy for them unless they can justify the cost of doing so.
That is a mentality which is too horrifiying to find proper words to describe.
> If people avoid doing their job because they don't like their tools that is a disciplinary matter. Time to find an administrator that can do the job required.
It's human nature to avoid difficult things. Discipline only goes so far; it happens subconsciously.
And anyway, small inefficiencies add up to mountains, but it can be hard to see what's going on when everything is just pebbles everywhere.
I dislike government intervention, and so disavow my following statement.
In the course of fining hidden externalized costs like those of dumping chemical waste into rivers, governments might fine software developers when they dump informational waste on to customers' machines, and if it does, they'll deserve it.
Anyone who ships an electron app goes to the gulag.
I dislike government intervention, and so disavow my following statement.
In the course of fining hidden externalized costs like those of dumping chemical waste into rivers, governments might fine software developers when they dump informational waste on to customers' machines. In this hypothetical, they deserve it.
Anyone who ships an electron app goes to the gulag.
The problem is I think you're also disagreeing with yourself in a way.
> That is not our job! Our job is to solve business problems within the constraints we are given. No one cares how well it runs on the hardware we're given. They care if it solves the business problem. Look at Bitcoin, it burns hardware time as a proof of work. That solves a business problem.
Cost is a business problem. It is a constraint. The problem is sometimes it doesn't feel that way because a lot of folks around you use bad practices and it's easy to horizontally scale things so performance is often brushed off.
Now that the era of 0% interest has ended, companies are actually starting to take a look at what they're running and... surprise! Taking even 10% longer to write quality, performant code can yield 2-3x (sometimes I've seen 1000x) improvements which means much lower cost.
> Some programmers work in industries where performance is key but I'd bet not most.
Because most don't know any better. Unfortunately this is a symptom of bootcamps, etc. that have created a lot of "programmers" that only know how to code vs. "engineers" that know how to build systems that are maintainable, scale, and solve complex problems.
In almost every company I've been at they inadvertently retroactively look back and realize how wasteful they were. In the ML world at least, it's a bit different I suppose because performance matters from the start.
I 100% agree. What I'm trying to get across is that you have to identify the cost to justify the performance tuning. If the cost of the tuning is greater than the cost incurred by not doing you might not want to do it.
What are your actual costs and how do they line up? Are you writing ML and big data then processing costs are huge. You can probably win by spending money on developer time to reduce processing costs. On the opposite end of the scale are you writing CRUD for a small business, then the development costs are likely to outweigh any costs from inefficiencies in the applications code.
To me the article read as if our job was to make every bit of code as fast as possible. I think we should only spend time on code that meets the greater goals of the system it operates in.
If you identify that run time is a cost then you have to identify what part of the code is the bottle neck and fix that. Then you have look back to see if any more improvements will be worth the time invested.
> To me the article read as if our job was to make every bit of code as fast as possible.
My interpretation was that he thinks the baseline for performance is too low. There are kind of two parts to performance: non-pessimization and optimization. He’s mostly harping on the former point, that you don’t have to go tuning anything. Just by writing code without following the clean code rules you can get a huge speed up for free, without any tuning at all. It’ll be free of obvious overheads, even if it doesn’t do anything special to fully utilize the hardware.
There are 1000x gains on the table! Almost all software is so far from pareto-optimal it's comical! It isn't a tradeoff between readability and efficiency, most software is unreadable and inefficient!
When every function in your program is virtual, fixing bottlenecks won't give you a significant speedup. You don't need to optimize programs to make them 1000x faster, we just need to stop shooting ourselves in the foot.
Well yes it kind of is? Everyone solves problems. We solve problems with computers.
And we use them because they’re autonomous, remember exact details and are very fast and reliable.
There’s of course some level of good enough. We don’t write ad-hoc scripts in assembly.
But to say dev time is more expensive than computer time only makes sense if programs are actually fast. Fast, reliable feedback loops matter. Consistency matters. And simplicity matters in many dimensions.
Web application servers that are orders of magnitude (N times) slower than they should be (not even _could_ be) cost us N times more hardware resources, N times more architectural complexity that require specialized workers and tools and so on.
Speed and throughout matter for productivity. Not just ours but our user’s as well. Good performance is important for good UX. Wasting fewer cycles opens up opportunities to do meaningful things.
Among the most popular languages is Python. It is popular in spite of its bad performance, high memory use, and lack of CPU multithreading.
And it is heavily ran on servers.
Why? Because running Python apps is still much cheaper than hiring humans to wait for calls or manage e-mails.
Humans are valuable. They should not be working on easily automatable problems.
The bottleneck is automating AT ALL, rather than automating with a low machine cost. Only at huge scale (i.e. Big Tech with billions of daily events) does it warrant to optimize the code.
Of course, assuming you have a sane computational complexity. If you don't, it doesn't matter which paradigm you use.
OK but... computationally heavy code in Python isn't generally really running in Python. Almost all of it either calls out to optimized C/C++/Fortran (eg; Numpy which in turn calls BLAS+LAPACK) or do that or some other separate compilation (eg; what ML libraries do with XLA, etc. or cython).
Sure, python web servers are a thing. Heck, we use them a fair bit. But it's a deliberate decision from the start where it makes sense to use it for that purpose.
>But it's a deliberate decision from the start where it makes sense to use it for that purpose.
Exactly that!
There is a time and a place for performant code but it's not the only metric we need to take into consideration. Sometimes you're better off using the slower tool or algorithm as it improves things on a dimension other than performance.
> There is a time and a place for performant code but it's not the only metric we need to take into consideration.
Of course not. But then, nobody is really complaining about apps they think are fast enough. The problem is, when something is noticeably slow, you complain about it, file reports etc, and are met with stiff resistance.
What's important is not that you make the machine go as fast as it can possibly go at all times. What's important is to know how fast it can go, so that you're aware of just exactly how much you're leaving on the table. The actual amount in most cases, would, I expect, surprise most people...
But the gist of the video doesn’t disagree at all.
In fact the resulting code was very clear, easy to write and understand. He got a 15x improvement by removing indirection and OO cruft. I don’t think he’s saying “don’t use language X” here, but rather “don’t make it harder for yourself and the computer”.
But it does disagree. Python is a dynamic language where essentially everything is indirect (an object).
Hardcoding types and methods (so they compile to simple/fast machine code as the video proposes) takes away flexibility (but you can do that with Cython by the way, but it is not nearly that popular).
I don’t know/use python. But I assume that it has hasmaps and vectors that are backed by something performant and possibly JITed?
The table driven, static dispatch approach Muratori isn’t necessarily “hard coded” in the general case. It’s just data driven, or table driven as he puts it. It’s not less flexible as you can easier add (append!) new cases.
In fact it reminds me of how I do it in higher level languages like JS and Clojure.
Of course we’re paying a substantial performance tax when using higher level, dynamic languages. We’re paying that tax in order to get very tangible benefits.
But we can still program in a way that is data oriented, simple and easy to work with by the compiler and runtime (JIT).
I think there are some ethical aspects of this that are missing.
Writing in efficient code has an environmental impacts and a human impact.
I still lean heavily towards clean code, because clean is easier to Grep and bugs have a much higher impact to my career, but efficiency imho should be addressed in the language or compiler, not in the code.
everything with computers being slow and janky these days despite running on hardware that we could have only dreamed of mere decades ago is somebody's fault—if not ours, then whose?
you can only excuse-kick the blame-can down the time-road so far.
Why? What if the user is a machine operator? He is standing around waiting for the machine to finish it's current cycle. As long as he can get his data entry done in the time he is waiting it costs the business nothing.
I am not sure how to explain that "user time is sacred" is from the point of view of the user, not of the business where the user is treated as a cog in a machine.
This is exactly the point. Each bubble has different goals.
For example:
The finance industry heavily uses Excel the inefficiencies this brings to performance are huge. However, the ability to get a domain expert to maintain the 'code' is worth it.
There are always trade offs. Even if you are in a performance sensitive industry your job is still not to write the most efficient code possible. You are there to make the best use of the resources you have to produce the best product you can. Sometimes that will mean doing things that harm performance but help in other ways.
> Our job is to write programs that run well on the hardware that we are given.
The author seems to be neglecting the fact that the whole point of “clean code” is to improve the likelihood of achieving the first goal (code that runs well, i.e. correctly) across months/years of changing requirements and new maintainers. No one (that I’ve ever spoken to or worked with, at least) is under any illusions that you can almost always trade off maintainability for performance.
Admittedly, I think a lot of prescriptions that get made in service of “clean code” are silly or counterproductive, and people who obsess over it as an end unto itself can sometimes be tedious and annoying, but this article is written in such incredible bad faith that its impossible to take seriously.
> The author seems to be neglecting the fact that the whole point of “clean code” is to improve the likelihood of achieving the first goal (code that runs well, i.e. correctly) across months/years of changing requirements and new maintainers.
Yes that is the whole point of "clean code". Thing, is, it failed.
Simplicity is better achieved with other methods. Forget Uncle Bob and SOLID, read John Ousterhout (A Philosophy of Software Design) instead.
I'm certainly not going to defend Uncle Bob here, but I don't think you have to be a SOLID cultist to think that preferring smaller functions, factoring out oft-repeated patterns, and relying on a language's abstraction features (e.g. polymorphism) are useful heuristics for managing complexity and maintainability in a codebase. Like, the article's author complains at length about the overhead of a virtual function call over a switch statement; I don't think their critique is really focused on the nuanced differences between the various schools of "how to make your code nicer to read" (at least, I didn't read it that way).
That being said, A Philosophy of Software Design looks like a good read. Thanks for the recommendation.
> I don't think you have to be a SOLID cultist to think that preferring smaller functions, factoring out oft-repeated patterns, and relying on a language's abstraction features (e.g. polymorphism) are useful heuristics for managing complexity and maintainability in a codebase.
They look like reasonable heuristics indeed, but they're perfectible.
Factoring out repeated patterns, that's good. Keep that one. But I would advise to wait for the pattern to emerge in the first place, so that when you factor it you know exactly what abstraction you need. https://caseymuratori.com/blog_0015(Semantic Compression)
The size of functions is a fairly poor heuristic. The main size you want to minimise is that of the entire code base. At the function (and class/module) level, it's better to minimise API/implementation ratios. That's how you know the function (or class, or module) is useful: small easy to learn APIs that hide significant implementations give you leverage, and help you minimise what you need to keep in mind whenever you're writing a new piece of code.
Relying on a language's abstraction features… yeah, I guess, though I generally avoid class based polymorphism. It's a bit heavy for my taste, especially when we have the ability to just pass closures around instead. Often that poor man's object is all you need.
> I don't think their critique is really focused on the nuanced differences between the various schools of "how to make your code nicer to read" (at least, I didn't read it that way).
It wasn't indeed, but you'll note his code ended up being quite a bit smaller than the original. All those abstractions are nice when they make your code shorter, or better organised but in this case they just didn't. We could blame the toy nature of the example, but still: all those one liners were terrible for the API/implementation ratio, it's no surprise they could be fused together so concisely.
Bob Martin should back his claims up. He asserted many things in his book, without evidence, and with examples so bad most of them actually hurt his case. "Clean Code" is just a bad book, best ignored. Great speaker, though.
---
As for SOLID, there's one good thing: Barbara Liskov. Her principle have mathematical underpinnings in type theory, shapes Haskell's type classes and likely Rust traits too. The rest however ranges from situational to just crap.
Single responsibility is at best a heuristic for the real goal: keeping a nice and small API/implementation ratio. And it fails way too often, causing you to make tiny classes and one liner functions, whose implementations are so tiny they don't even pay for their interface. Pretty bad overall.
The Open/Close principle is just crap. Don't use inheritance if you can help it, and don't bother with keeping your code open for this or closed for that. Just keep it simple, so that when requirements changes you can rewrite the parts you need to rewrite.
Interface Segregation is a situational heuristic. Just keep your interfaces small, and you'll know when it makes sense to split an API in two or not.
Finally Dependency Inversion is cancer. I mean that literally: it causes your code to grow unsightly appendages, makes everything it touch a tad bigger and more complex, and in most cases it doesn't even facilitates testing. Because surprise, the overwhelming majority of the time, code dependencies are fixed. So let them be. Don't complicate your program with interfaces that only have a single implementation. Let your code depend on the implementations directly. It will be simpler, easier to navigate, easier to modify, and just as easy to test.
That's not "failing" that's at most "it doesn't work for me", but without explaining why it doesn't work for you, it really is little more than a wild claim.
I've had great success with "Clean code" practices. Though I implement it more according to Alistair Cockburns' "Hexagonal Architecture", it's overall very similar and strives for the same goals using the same methods. So in that sense, N=1 it hasn't failed. It has helped at least one person.
Can't do much better in short HN comments. I've tried at times to be more rigorous than that, but so far I've only scratched the surface: https://loup-vaillant.fr/articles/good-code
I've seen clean code lead to over-architected and unmaintainable nightmares that ended up with incorrect abstractions that became much more of a problem than performance.
The more the years pile up the more I agree with the sentiment in this post, generally going for something that works and is as optimal of code as I would get if I was to come back to make it more performance oriented in the future, I end up with something generally as simple as I can get. Languages and libs are generally abstract enough in most cases and any extra design patterning and abstracting is generally going to bite you in the ass more than it's going to save you from business folks coming in with unknown features.
I suppose, write code that is conscious of its memory and CPU footprint and avoid trying to guess what features may or may not reuse what parts from your existing routines, and try even harder to avoid writing abstractions that are based on your guesses of the future.
The table version also has something really interesting going for it: it just begs to be thrown out and replaced should new requirements come in that don't fit with that design. If a new type of shape comes in that doesn't fit with the factor * width * height model (e.g., a trapezoid), we'd need to go back to the drawing board and figure out how to make the existing and new cases work harmoniously together.
On the other hand, an abstract base class broadcasts the message that we should fold our new cases into the existing design -- that's what base classes are made for. But even when new cases don't fit neatly in the existing design, we often feel as programmers that we need to pay respect and deference to existing design, especially if it was made with extensibility in mind. And so we add more complexity (maybe we need an extra field or an extra method), make more kludges, and soon enough the original OO design is a mountain of complexity and has so much "gravity" that it's nearly impossible to escape it anymore -- nobody can imagine throwing it out and starting fresh, so it just keeps gaining complexity. And for all its complexity, it's also slower.
Much of this is very compiler dependent. For example, Java's compiler is generally able to perform more aggressive optimisations, and even virtual calls are often, and even usually, inlined (so if at a particular call-site only one shape is encountered, there won't even be a branch, just straight inline code, and if there are only two or three shapes, the call would compile to a branch; only if there are more, i.e. a "megamorphic" call site will a vtable indirection actually take place). There is no general way of concluding that a virtual call is more or less costly than a branch, but the best approximation is "about the same."
Having said that, even Java now encourages programmers to use algebraic data types when "programming in the small", and OOP/encapsulation at module boundaries: https://www.infoq.com/articles/data-oriented-programming-jav... though not for performance reasons. My point being is that the "best practice" recommendations for mainstream language does change.
> and even virtual calls are often, and even usually, inlined.
Last time I checked it could not inline megamorphic call sites, evn if implementations were trivial (returning constants).
At the same time I saw C++ compilers able to replace an analogue switch that dispatched to constants with a simple array lookup, with no branching at all.
If I'm not mistaken, C2 inlines up to three targets, but of course, as a JIT, it inlines more aggressively than an AOT compiler, as it does not require a soundness proof.
Clean code to me is like good writing. It should be easy to read and comprehend later. The rules are not rules but guidelines.
I think OO centric rules are harmful in a word where languages support functional programming etc. Polymorphism isn’t always the best answer. A big nested if statement that reads like the business spec can be easier to follow and reason about.
That aside if easy to understand code makes your app a bit slower, you profile to work out why and fix up the bits that matter making the tradeoff where it is needed.
Writing code for what you think will be performant everywhere, and not caring about readability in the process is a fools errand, at least in most SaaS/Web/Business apps.
And another similarity to good writing - sometimes you really do need to have really dense, precise stuff that isn't easy to understand without a close read. You put that into an appendix when writing, and into optimized low level routines/libraries when writing software. That way you can make things readable, but still have the "performance".
Yep.. I much prefer 2000 lines of code in one function with very little calls. Easier to reason about, less bugs, easier to maintain. It's very easy to indicate when a new part is coming, just enter a big comment block of what you're doing.
This is the perfect example. You start out like that and all is fine, but two years, a few changes and some bugfixes later your now 2700 line method is a total zombie. The comments blocks are lying and you have incromprehensible dependencies. More over you have to read it all every time you make a change in the method, its inputs or understanding its output. That just screams 'refactor me!'.
The exceptions of course are when initializing a lot, or having a ton of bindings, etc. No need to cramp this into sub methods like its a religion.
Cramping code may be fine for high performance code where the mandays are justified for the two line code fix. But in most Software you likely just want to include the new cache methods, change the used object, add a button or fix an update. And the person fixing it probabaly hasn't seen the specific code ever before. There is sadly no need for high performance software, when you can't sell it in time.
If you want logical separation, add sub-blocks to the function. You can see things as being a spectrum from fully inline code, to code blocks, to anon functions defined in the function, to true functions.
Still better than nested classes and 50 different places where the code is. There are plenty of places where it makes more sense. Also in 'normal' code. It's about readability and maintainability. Classes mean -> extensibility, and is by definition more complex. Factoring out some functions make sense yes, but not the general mantra "if it's 5 lined it should be a function".
As Uncle Bob says (paraphrasing): Dependency injection (java) is just programming XML where you turned compiler errors into runtime errors
It is also important to consider that better performance also increases your productivity as a developer. For example, you can use simpler algorithms, skip caching, and have faster iteration times. (If your code takes 1min to hit a bug, there are many debugging strategies you cannot use, compared to when it takes 1s. The same is true when you compare 1s and 10ms.)
In the end, it is all tradeoffs. If you have a rough mental model of how code is going to perform, you can make better decisions. Of course, part of this is determining whether it matters for the specific piece of code under consideration. Often it does not.
This is definitely not something that should be overlooked. Choosing a more mathematically optimal algorithm might be 2-3x faster in theory (at the cost of more complexity). If you're executing that algorithm a lot, to the point where a 3x speedup is significant, well -- if you can restructure the code in a manner similar to that demonstrated in the article (avoiding costly vtable dispatches, indirection, etc) and achieve a 25x with the original simpler algorithm, then that's something worth taking into consideration. A 3x algorithmic improvement is only impressive if there isn't 25x potential speedup low-hanging fruit (from simply not writing your code in a moronic way in the first place).
Yes.
When I worked as a game engine programmer, two of the first things I did were to improve the speed of compiling and starting the games.
When those two things are faster, all development will be faster and more fun.
The no.1 piece of advice I give to junior programmers now, or any programmer trying to improve, is to care about your code, everything else can naturally and more safely emerge from that one principle.
The problem with laying down a bunch of arbitrary rules is that they never apply to all scenarios. As the person coming up with the rules you can easily re-evaluate where and when they don't work, but the novice receiving those rules wont necessarily have an intuition for the reasoning behind them yet, and so wont so considerately apply them. For everyone else, they need to understand that there is no silver bullet, no 10 commandments that will give them the best result, life is messy, and they need to think, develop their own intuitions by interrogating their own code in each new context - but it all starts with caring about your code, not being satisfied with a pile of spaghetti, or a pile of OOP just because OOP, or a pile of strictly pure functions just because FP. Every single rule or programming pattern is wrong given enough contexts, it's all subjective.
Discussing patterns and rules is useful, but only if they are only used as a mental anchor to think about them, not some kind of axioms of programming correctness.
> Our job is to write programs that run well on the hardware that we are given.
I actually believe "the hardware that we are given" is the entire root of the problem.
Most programmers work and test using whatever hardware is current at the time, but this is makes them blind to possible performance issues.
Take whatever you're working on, and run it on the hardware of 5-10 years ago. If you still have a good experience, you're doing it right. If not, you should probably stop upgrading developer machines for a while.
Whatever your minimum hardware requirements are should determine your development machines. This way, you will naturally ensure your low-end customers have a good experience while your high-end customers will have an even better experience.
My game studio has been doing this for years. It saves money for expensive hardware, it prevents performance issues before they arise and it saves developer time for not having to overthink optimization.
I don't think the "job description" is accurate though. Most programmer's jobs is to write code that runs well _enough_ on the hardware that we are given. What "well enough" means depends heavily on whether you are working on firmware, a game engine or a web application. If performance isn't important, you end up with slow software.
"Use subclasses over enums" must be some niche advice. I've never heard it. The youtuber seems to be referring to some specific example (he refers to specific advice from "them") so I guess there's some context in the other videos of the series.
re: the speedup from moving from subclassing to enums - Compiler isn't pulling its weight if it can't devirtualize in such a simple program.
re: the speedup from replacing the enum switch with a lookup table and common subexpression - Compiler isn't pulling its weight if it can't notice common subexpressions.
So both the premise and the results seem unconvincing to me.
Of course, he is the one with numbers and I just have an untested hypothesis, so don't believe me.
Saying things akin to "your compiler is bad" because it doesn't optimize stuff like this is a cop out.
For one: Most compilers for most languages are bad by that metric, and interpreters don't even get to play. So this is not helpful for the vast majority of people. Waiting around for them becoming good is not a viable option.
Second, say the compiler would perform good in this scenario. Cool, lets go up a notch, or two, and it would start performing bad again, because there are limits to what it can do in a reasonable amount of time.
And if that limit were big that maybe wouldn't matter, but the limit is low, and so it does. Real programs are so much more complex than this example, that even if the compiler got 10 times better, it would still fail to optimize large parts of your real program.
A compiler can only devirtualize a virtual call if it knows the underlying object’s type at compile time, which it certainly won’t for a runtime populated array of base class pointers.
CSE doesn’t work across function boundaries unless those functions are inlined, which won’t happen with virtual functions due to the above.
I think "them" is someone named Robert Cecil Martin, but I'm not sure if this example from the video appears in his book.
What compiler are you using that devirtualizes every class hierarchy? I suspect that Casey is using C++ so he may (unfortunately) have multiple translation units in his program.
Martin Fowler's "Refactoring" also has "Replace Conditional with Polymorphism". I like the intent of the book but I don't follow it 100%. Another part of that book that tripped me is where he calls the same function with the same argument multiple times instead of saving the result in a variable for reuse.
The existence of a refactoring in the Fowler refactoring catalog is not normative advice that it should always be applied. Refactoring also has “Extract Function” and “Inline Function” as refactorings. They can’t both be right…
Refactorings are moves you can make. Choosing when to make them is up to you. In fact, Fowler provides guidance along with each refactoring suggesting when it might be applicable (i.e., not always)
>I suspect that Casey is using C++ so he may (unfortunately) have multiple translation units in his program.
Yes, the video uses C++.
Obviously if one compiles a library then the compiler has no way of knowing that other subclasses of `shape_base` do not exist. My point is that when compiling a binary as they are doing for their video, the compiler knows that there are no other subclasses that it needs to cater to.
It might require LTO explicitly, of course. At the very least godbolt doesn't devirtualize without LTO [1], but godbolt itself breaks if I enable LTO [2] and I CBA to test locally right now.
Correct me if I'm wrong. Even if the compiler devirtualizes the classes, you still have the memory cost of storing the vtable pointer in each of the object instances (8 bytes for each instance), which means you need to do more fetches from memory. Does CPU prefetching negate the cost of these additional memory lookups?
Even when the target is an app, said app might be loading DLLs at runtime which may contain other subclasses.
The fundamental problem here is that C++ doesn't have a good abstraction to represent visibility of public types, since any other translation unit - even across the DLL boundary! - can re-declare the type and then derive from it. The only way to constrain visibility is to use anonymous namespaces, and that only works if the type can be confined to a single unit (that C++ compilers seem to ignore the optimization opportunities here in practice perhaps indicates just how rare this actually is).
I like this post a lot, even if it's a somewhat contrived example. In particular I like his point about switch statements making it easier to pulled out shared logic vs. polymorphic code.
There's so much emphasis on writing "clean" code (rightly so) that it's nice to hear an opposing viewpoint. I think it's a good reminder to not be dogmatic and that there are many ways to solve a problem, each with their own pros/cons. It's our job to find the best way.
One of the the points of "clean code" is to make it easy to find the hotspots and optimize those. Write the codebase at a very high level, plenty of abstractions etc. and then optimize the 1% that really needs it. Optimizing a small piece of software is not going again clean code, on the contrary, it re-enforces it: you can spend the time to optimize only what is necessary.
One of the main points of Casey's videos is that following the "clean code" mantras will make your code unoptimizable. You may delay performance considerations until the end, then fire your profiler, find that 1% that really needs it[0], ... and realize that you can't get more than 1.5x - 2x speedup without ripping out the core 10% of the codebase and rethinking it properly. Were you, however, to consider performance from the start, that 10% would've been designed around completely different abstractions, and already 10x faster in the unoptimized version.
"Clean Code" should be called pessimistic coding - a big part of it is to enable OK flexibility in any imaginable direction. But real-life code will not change in all possible direction - in fact, you can predict quite well roughly what can and cannot change. Writing for performance means, among other things, making things easier both for human and the CPU by reducing flexibility in the unlikely directions.
In the toy example from the video: Casey's proposed alternatives baked in the assumption that the program is working with shapes which, for a given computation, all can fit a specific family of equations. Clean code will make it just as easy to add a square as to add a parametric spline surface. Casey's code will make the former trivial, the latter hard without redoing the entire shape-related code. It's a good tradeoff if you're making a program that mostly works with non-parametric simple polygons, because nobody will need parametric splines in it. On the off chance they will, they can pay for the extra effort - and in the meantime, your software is 20x faster than the equivalent "clean code" version.
--
[0] - This thinking alone is a problem. It's not the 1% that needs some optimization work. The entire user-interacting surface and everything downstream of it need it, which means effectively the entire program. You're free to set a cut-off point beyond which you don't care about "less important" features - but 1% seems quite too early.
In my personal opinion, this is less of an argument of "clean code" vs "performant code" and it seems to be more of traditional "object oriented programming" vs "data driven design".
Ultimately though, data driven design can fit under OOP (object orient programming) as well, since it's pretty much lightweight, memory conforming structs being consumed by service classes instead of polymorphing everything into a massive cascade of inherited classes.
The article makes a good argument against traditional 1980-90's era object oriented programming concepts where everything is a bloated, monolith class with endless inheritances, but that pattern isn't extremely common in most systems I've used recently. Which, to me, makes this feel a lot like a straw man argument, you're arguing against an incredibly out-dated "clean code" paradigm that isn't popular or common with experienced OOP developers.
One only really has to look at Unity's data driven pipelines, Unreal's rendering services, and various other game engine examples that show clean code OOP can and does live alongside performant data-driven services in not only C++ but also C#.
Hell, I'm even doing it in Typescript using consolidated, optimized services to cache expensive web requests across huge relational data. The only classes that exist are for data models and request contexts, the rest is services processing streams of data in and out of db/caches.
If there is one take-away that this article validated for me though, it's that data-driven design trumps most other patterns when performance is key.
The problem with the contemporary "clean code" concept is that the narrative that performance and efficiency don't matter has been pushed down the throat of all programmers.
Re-usability, OOP concepts or pure functional style, design patterns, TDD or XP methodologies are the only things that matter... And if you use them you will write "clean code". Even worse, the more concepts and abstractions you apply to your code the better programmer you are!
If you look at the history of programming and classic texts like "the art of programming", "sicp", "the elements of programming"... The concept of "beautiful code" appears a lot. This is an idea that has always existed in our culture. The main difference with the "clean code" cult is that "beautiful code" also used to mean fast and efficient code, efficient algorithms, low memory footprint... On top of the "clean code" concepts of easy to test and re-usable code, modularity... etc
Strangely enough, even in this video, everyone notice how you're taken to make tradeoffs between coherence and cleanliness vs performance-oriented code in langs like C++, while the Clean Code book was exemplified in Java (so most of performance is shoved in JVM code) and SICP was in Scheme (which is interpreted, or transpiled to C where you can optimize, or has a VM/JIT underneath).
I vaguely smell the language of choice has something to do with that, and C++ would benefit more from data-oriented design than from literate OOP or functional programming patterns took from very very different programming ethoses (the programming language is an interface to something else)
One of the most insane things I hear repeated constantly is that "servers are cheaper than programmers", implying that runtime efficiency doesn't matter, only developer efficiency.
Which is all well and good, until you need to hire all the network engineers, systems administrators, devops people, security staff, datacenter operations managers, database sharding engineers, etc to manage the 10x more hardware and network surface area you have to throw at your slow codebase.
Go full Cloud and you won't need to hire most of these roles ever, except in niche ultra-high performance cases. These roles will be externalized at AWS or Azure, super-paid to work for you 24/7.
At least in the majority of places I worked at, people cared about readability and maintainability rather than just some abstract notion of "clean code".
And perhaps I was lucky, but typically readability and maintainability are orthogonal to performance and efficiency. Sometimes readable will have optimal performance, sometimes not. Then it becomes a matter of tradeoffs.
The first entry even stated that they were guidelines and if you have a valid reason to deviate: discuss it and you'll get an exception.
The discussion thing was mainly for new coders. We had a library part of the code that was used by many programs. Optimizing parts of it for their use case, could make it unusable for the others who used it.
Communication is key. If you discuss, before implementing it, why you're making certain design decisions then everything goes a lot smoother. If there are objections, keep in mind that the worst case isn't throwing away your design and starting over. The worst case is implementing it and screwing over your fellow coders.
> Even worse, the more concepts and abstractions you apply to your code the better programmer you are!
I had an Android programmer, who was eager to write clean code following GOF patterns, OP and the rest of the fancy things senior developers usually do.
Ended up Android team with 3 devs required 3x time to develop same feature compared to single iOS engineer.
It's often said that Design Patterns are workarounds for limitations in the expressiveness in a language: for example, the Singleton pattern is only useful in languages where you can't pass a reference to an interface implemented entirely by static methods - or the Visitor Pattern is the workaround for a language not supporting Double-Dispatch - method call chaining is a workaround for not having a pipe operator, and so on.
You said you're targeting Android, that implies you were using Java, which has its reputation for both a rigidly inflexible language-design team and its ecosystem having more design-patterns than a set of fabrics swatches - that's not a coincidence.
But for iOS, they'd be using Swift, right? Swift's designers clearly decided they didn't want to be like Java: take the best bits of C# and other well-designed languages and don't be afraid to iterate on the design, even if it means introducing breaking-changes - but the result is a highly-expressive language that, as you've demonstrated, allows just 1 Swift person to do the equivalent of 3 Java people. Swift is an actual pleasure to use, but using Java today makes me weary.
(To be clear: Java was a fantastic language when it was introduced, but it simply hasn't kept-up with the times to its own detriment, it feels like its falling behind more-and-more at time goes on - but that's going to be the fate of every programming language eventually, imo).
Well they weren’t a good senior developer then. Part of the art is knowing when to use the patterns. An incredibly im protest differentiation as it’s so so easy for someone a bit green behind the ears to see this and think that any sort of architectural thinking is useless.
> Well they weren’t a good senior developer then.
He wasn't. He was a mid grade dev, but it does not important, because even sr devs can fall in love with overcomplications.
Reminds me of Go. At the beginning you make only simple moves. As one learns the game more complicated patterns emerge. Watching master games the lines again are simple and clear - in hindsight.
> the more concepts and abstractions you apply to your code the better programmer you are!
These contradict each other. XP very explicitly opposes introducing (unnecessary) abstractions: YAGNI, DTSTTCPW, etc. And TDD is a good tool for enforcing that, as you only get to write code that you have a failing test case for.
> TDD is a good tool for enforcing that, as you only get to write code that you have a failing test case for.
TDD encourages the use of mocks and unit testing to increase code coverage. And unit testing is specially dangerous. You write a test, then program, so the test is helping you (the programmer). Selling the idea that the higher the test code coverage is the better and safer your code is. Not true at all. If your code doesn't have integration tests for example, you will never know how it actually runs. If you mock everything, you are not really "testing" anything but your internal logic. Unit testing and code coverage just checks that a code path has been run. But there are other tools like fuzzy testing or mutation testing... Do you randomize the memory at every test run? Do you make sure that the CPU cache is cold or hot depending on the test? Good testing is hard.
Most unit tests are written to ease the development. After finishing the development, they are safe to delete. Because they don't add any real value as I understand it. I understand that a test is a business contract of something that MUST work in a certain way. Unless the contract changes, the test must never be removed or changed. TDD and exhaustive unit testing make the maintenance process harder because you don't know if a test is useful or not.
If you follow TDD, most unit tests are re-written all the time. Because they were not written to test a business or critical contract, they were originally written to help some programmer write some internal logic.
> TDD encourages the use of mocks and unit testing to increase code coverage.
No, it encourages reasonable decoupling, i.e. good design.
If you see yourself introducing mocks (I think you mean stubs, mocks are something more specific) everywhere, you are feeling the pressure, but avoiding the good design.
And of course that's important because it enables you to be courageous and refactor mercilessly. Which again is important because it enables you to Do the Simplest Thing That Could Possibly Work, and stick to YAGNI, because you know you can change your mind later.
> If you mock everything, you are not really "testing" anything but your internal logic.
That's the purpose of unit tests. They do not exclude the need to perform other kinds of test. Integration tests, contract tests, stress tests - all those will focus on different facets of a system.
> Most unit tests are written to ease the development. After finishing the development, they are safe to delete.
This is especially bad advice, unless no one will never touch that codebase ever again.
I saw old unit tests highlight bugs that would have been introduced by new code many times over the years.
> TDD and exhaustive unit testing make the maintenance process harder because you don't know if a test is useful or not.
Then, as a developer, remove unit tests that became useless.
Code coverage is a measurement. If you turn it into a goal, it will become useless. If you have "useless" unit tests, it tells me that some unit tests were written as padding to move code coverage up.
> TDD and exhaustive unit testing make the maintenance process harder because you don't know if a test is useful or not.
TDD was later given the name BDD (Behaviour Driven Development) to emphasize that you are not testing, but actually documenting behaviour. What kind of behaviour are you documenting that isn't useful and why did you find it necessary to document in the first place?
> If you follow TDD, most unit tests are re-written all the time.
What for? If changing requirements see that your behaviour has changed to the extent that that your unit does something completely different, it's something brand new and should be treated as such. Barring exceptional circumstances, public interfaces should be considered stable for their entire lifetime and, at most, deprecated if they no longer serve a purpose.
The implementation beneath the interface may change over time, but TDD is explicit that you should not test implementation – it is not about testing – only that you should document the expected behaviour of any implementation that may carry out your desired behaviour.
> Selling the idea that the higher the test code coverage is the better and safer your code is.
I don't think someone that believes this has a good understanding of unit testing. You can easily get 100% coverage without testing anything at all!
Coverage is a great metric if it's predicated on high quality tests. Even then 100% coverage doesn't equal "safe". It means that a lot of effort has been put into understanding and testing internal behavior.
You still need higher order tests, arguably even more.
I generally don't mock in my TDD-style approach to writing tests. If you have a dependent class, e.g. writing nested serialization logic for JSON objects, you shouldn't mock the inner classes or the JSON serialization classes.
Well written unit tests serve to provide regression testing to prevent bugs reoccurring and to keep existing functionality (e.g. support for reading existing data) working.
TDD is used as a way to help write tests for the API surface and usage of your classes, functions, etc.. You should generally avoid testing internal state as that can change.
For example, if you are writing a set class, the logical place to start is with an empty set -- that's because it is easy to define the empty logic, defining accessor functions/properties like isEmpty, size, and contains. The next logical step is adding elements (two tests: add a single element, add multiple elements). Etc.
Later on, you can change the internal logic of the set from e.g. an array to a hash map. You will keep your existing tests as they document and test your API contract and external semantics. Likewise, if your hash set uses another class like an array, or a custom structure like a red-black tree, you shouldn't mock that class.
> If you have more than one person working on a codebase; clean code matters a lot.
You can have fast and clean code, just not the Uncle Bob style of "clean code". Uncle Bob hijacked the meaning of cleanliness. It doesn't mean that code written like that is actually clean, in fact it's usually the opposite: Uncle Bob's clean code is NOT clean.
It's also worth noting that Bob's advice is for Java, which has relatively unusual performance-idiosyncrasies around polymorphism and function calls. Much of the advice becomes very poor in languages that aren't JVM-based.
One can argue that simple code is both fast and clean.
However, you can only measure how fast (or slow) your code is, so make it simple, aim for fast and hope it's clean (:
My comment was a jest (as suggested by the smiley).
What I take from Casey's post is that a simple non-pessimistic representation allows for efficient code. That is, using a table instead of a class hierarchy gives massive performance boost.
Compared to a "clever" loop unrolling doesn't give that much of a boost.
So we need simpler representation. IMHO the table implementation is not less readable nor less flexible than the class hierarchy. But it is less common in the code I am used to, in other words, it's not a widely used pattern).
That's the insight, I think. "Clean Code" tells you to use maximally pessimistic representation for everything, because everything could be extended in some way in every direction. Meanwhile, in the real world, you likely have a good idea what directions of evolution are possible, and which of them are even useful.
Casey's example shows you that, if you design your code to make use of those assumptions, you'll get absurd performance benefits for little to none loss in readability (and perhaps even a gain!).
Some may ask, "what if you're wrong with your assumptions?". Well, you pay a price then. Worst case, you may need to rip out a module, rethink the theory behind it, and rewrite it from scratch - likely forgoing some of the performance benefits, too. Usually, the price will be much smaller. Either way, it's still better than being maximally pessimistic from the start, and writing software that never had a chance of ever becoming good or fast.
> the narrative that performance and efficiency don't matter
That was the case during the 90s and the first decade of the 2000. Just wait for Moore's Law to kick in and in 18 months your code will get faster by an order of magnitude for free.
Moore's Law concerns IC transistor counts, not actual overall performance, and especially not single-threaded performance: a 40-core CPU isn't going to make Windows twice as fast as a 20-core CPU.
Single-threaded performance has long-since effectively plateaued: it's 2023 now and a desktop computer built 10 years ago (2013) can run Windows 11 just fine (ignoring the TPM thing) - but compare that to using a computer from 2003 in 2013 (where it'd run, but poorly), or a computer from 1993 in 2003 (which simply wouldn't work at all).
This is not to say that there won't be any significant performance gains to come, such as with rethinks in hardware (e.g. adding actual RAM into a desktop CPU package, non-volatile memory, etc) but I struggle to see how typical x86 MOV,CMP,JMP instructions could be executed sequentially any faster than they are right now.
That article doesn't contradict my post, in fact it's basically the same thing I'm saying: look at fig2 (the timeline graphic) and the paragraph preceding it: it shows that the gains in "serial" HPC performance gave-way to massively-parallel gains sometime around 2010.
The rest of the article is concerned with how software today is still written for those "serial" processors in-mind and fails to take advantage of parallel computing hardware - but this is hardly a new nor controversial statement.
You started your post saying that there's no correlation between transistors and performance. First graph shows that is wrong.
My argument was simple: 20 years ago nobody cared about performance because of the strong correlation between Moore Law and MFLOPS and performance in general.
Nowadays with multicore, individual CPU cores are becoming slower, not faster. Then we can't just wait 18 months for individual cores to get faster.
> You started your post saying that there's no correlation between transistors and performance.
No, that's not what I said: I'm saying that single-threaded performance is no-longer directly (let alone linearly) correlated with transistor count, and hasn't been for decades - but it's single-threaded performance that matters for most end-user applications on peoples' computers/smartphones/etc.
And Moore's Law makes no claims about performance increases either, only transistor density. It's literally the first paragraph of the Wikipedia article:
> Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years. Moore's law is an observation and projection of a historical trend.
The link to performance (any kind of performance: parallel or serial) was made incorrectly by another Intel executive, David House, but that assumption simply isn't true.
To summarize: *yes*: bleeding-edge IC transistor density generally doubles every 18-24 months, but this does not translate into any kind of performance-doubling in end-user applications.
In fact, on the contrary, perceived "performance" (however you define it) outside of parallelizable programs, has demonstrably stagnated.
A lot of people here seem to be saying that it's a spectrum between "clean code" and "performant code". Even the author alludes to that. Or that most code doesn't need to be fast.
I find that viewpoint concerning because the reality is this isn't really a dichotomy. Code can be both performant and clean (note clean is not the same as elegant).
One thing I think is confusing people is the dogmaticism about what's "idiomatic" is especially bad in OOP-heavy languages. This is especially bad in Java where a fetishization of design patterns have led to codebases which are both ugly and unperformant.
The reality is software design needs to consider performance from the get-go. Sure there is such a thing as "premature optimization" but if you've determined performance is a goal then you should follow best practices for high performance from the get-go. That includes not trying to perform math on iterables of objects (since that prevents vectorizatio since the data isn't contiguous), avoiding accumulations, not creating and destroying tons of objects, etc. This can all be done in a clean way! And low level code can be cleanly encapsulated so that other interfaces remain idiomatic and simple.
A lot of people fret that this approach leads to "leaky abstractions" because implementation details inform the interface design. That just means you need to iterate on the interface design so it makes sense.
I find it amusing that many corporate dev teams picks C++ for its performance / low levelness, but then reject any code that Casey's advocate for. It is extremely hard to convince them to consider these things (ie in this case cache misses and branch mispredictions).
Now, if we consider only a conservative 2x speed-up, I might not care if my app starts up in 2s or 4s, but I do care if my device's battery last for 20h v 10h.
You can optimize even further by creating a custom chip to compute the area of shapes in the order of billions per second. But what's the point? Where is the value?
Can't say it better than Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
Clean code has never been about performance, it's about the other people that will read your code – future you included. Performance of people is more valuable for any product than code performance[1]. Only when the performance becomes a bottleneck, or you want to optimize energy efficiency then sure, don't pass that 3% opportunity.
[1] I would argue that it still holds for products that need high(er) performance code like video games, embedded systems, particle physics, ... These products just happen to hit bottlenecks way faster and some have hard cutoffs (eg. 60 fps for a game). Still, not everything needs to be optimized to the extreme: the algorithm to sort an in-game inventory does not need to handle 4B+ items.
> The improvement in speed from Example 2 to Example 2a is only about 12%, and many people would pronounce that insignificant. The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn't bother making such optimizations on a one-shot job, but when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies.
That paper was published in 1974 and yet it captures the mindset of many a programmer in 2023 perfectly. The part I like about this paragraph is the "easily obtained" sentence; I saw a comment from someone mentioning that they made a JavaScript program 10 times faster by replacing the common functional programming combinators (map, filter, reduce, etc.) with for loops. I think most of us would say such a change is easily obtained, does not make the code impossible to debug or maintain, and gives such a massive improvement that it should be a no brainer to reach for it.
Good point, and also why the topic of performance is so prone to heated discussions. To paraphrase Knuth "97% of the time, don't optimize except if it's an easy 12% improvement". I don't think there exists a good heuristic that works each time or for all languages. The right decision is left (pun intended) to the programmer that has to live with it. For me the ideal decision took into account the performance tradeoffs of each level of abstraction and chose the "appropriate" one. Obviously, the "appropriate one" depends on an uncountable number of variables including future scaling issues, probability of refactoring/irrelevance, cost/reward of spending the time to optimize the function, ...
A 10000x performance gain could still be insignificant, see Amdahl's law.
That's why you need to understand bottlenecks in your system before going around and start optimizing things, as it could be pointless or even counter productive.
No, I don't actually agree with this at all. Sometimes -- a lot of the time actually -- you can have a pretty good idea ahead of time what will be fast and what will be slow. If you design the system before thinking about this, you can paint yourself into a corner and lock-in a fundamentally slow architecture, just like you can lock yourself into a fundamentally un-maintainable architecture.
Like, when you're choosing a big data structure that is going to have lots of by-value lookups, you don't implement it as an array with O(n) lookup first and only move to a hashtable after benchmarking it. That would be absurd. You just use a hashtable with O(1) lookup right at the start. Because in the overwhelming majority of cases that's the right thing to do and it doesn't need any justification. And the reason you can do that is because you're an engineer and you know a damned thing about the domain you're working in.
Structural engineers don't build a skyscraper out of paper mache first, and then rebuild it in concrete when it collapses in a stiff breeze. They just build it out of concrete the first time around.
What so many people thoughtlessly call "premature optimization" isn't "premature" at all, it's just "knowing about the problem and knowing how the computer works". When you deliberately ignore this, what you're actually doing is "premature pessimization". Coding as if you don't know the difference between cache and RAM in 2023 is like coding as if you don't know the difference between RAM and disk in 1973. It's negligence. You know better!
And look at what the Clean Code people want you to do instead. All these rules are geared towards future extensibility. Is that not a form of premature optimization? But optimizing for extensibility, not speed. And you end up with codebases littered with abstract interfaces that only have (and will only ever have) one implementation. But you pay for that premature abstraction in both cognitive load and CPU load. You ain't gonna need it!
Obviously, if you are doing performance-critical code in a performance-critical application, you will be doing stuff like inlining and other things that "break" clean-code "rules".
I put that into quotes, because to me personally these aren't strict rules - but rather guidelines. And they aren't meant to be pushed to the absolute extreme - but rather be seen as methods/tools used to achieve the actual goal: easily readable, maintainable and modifiable code.
And in my workplace, "performance" isn't measured in cpu-cycles, but rather in man-hours needed to create business value. Adding more compute power comes cheaper than needing more man-hours.
For the most part, it still seems to be a good idea to train new developers to know and understand clean code. It will help them produce more stable, less buggy and more readable code - and that means the code they write will also be easier to optimize for performance, if necessary. But with my work, that sort of optimization seems only ever necessary for very small pieces of code - most definitely not the entire code base.
I don't think there is a contradiction or surprising point here.
At least my understanding of the case for clean code is that developer time is a significantly more expensive resource than compute, therefore write code in a way which optimises for developers understanding and changing it, even at the expense of making it slower to run (within sensible limits etc etc).
Yeah you're right, nobody ever said to me "use interfaces because the code it's faster".
On the other hand, when I studied dynamic dispatch and stuff like that, I don't think enough people told me "when you do that, you are making this tradeoff".
I feel like it's worth sharing this kind of knowledge in order to make better informed decisions (possibly based on numbers). There's no need to become an extremist in either direction.
The trade off has changed over time as memory access has become a bigger and bigger bottleneck. But caring about this is still "premature optimisation".
The claim of 20x program performance difference is overblown. Compilers can often remove virtual function calls, JITs can also do it at runtime. Virtual function calls in a tight loop are slow but most of your program isn't in a tight loop and few programs have compute as a bottleneck.
Measure your program, find the tight loops in your program and optimise that small part of your program.
I think this is hilarious when talking about someone like Muratori, who has had his hands in more games' code than most people on this site have ever even played. He is incredibly productive by comparison to all the people making excuses why their programs are slow bloated garbage fires.
The “developer time is valuable” mantra is thrown left and right, disregarding how much of that valuable resource will be wasted down the line due to bad implementations.
If we optimize for developer time, let’s optimize across the software’s entire lifecycle, not just that first push of a MVP to production.
This is a good point but it gets complicated as you often don't know the software's entire lifecycle so you have to optimize for something slightly different.
> developer time is a significantly more expensive resource than compute
This also presupposes that making a fast program is a lot more work. However poor performance is usually due to negligence rather than a lack of optimization effort. All you need to write reasonably fast code by default (without micro-optimizing) is:
1. a good knowledge of available algorithms
2. a good understanding of the problem
1. Is a one-time investment on the programmers part that benefits all future programs they write. There is no marginal cost to being familiar with what's available in <algorithm>. 2. Has a marginal cost, but it's probably a time saver anyway. Measure once, cut twice.
The context for the author is game development. A game can't just spin up a few more servers in the cloud. The slower the code, the smaller the customer base. Some fields have more severe constraints, where developer time may be expensive, but compute is priceless.
(This is not to endorse writing code that is harder than necessary to understand, or failing to document the parts that are necessarily hard.)
Not to mention the users. For all the bullshit the marketing departments of every company spew about valuing their customers, software companies don't really give a damn about the many person-hours of users' lives they waste to save a person-minute of dev time.
Everything is a balance though, and the tradeoff may not be linear. Where they've landed on it may be the only way to get the software to customers at all in a cost effective manner.
I prefer a much simpler rule: if it's easy for the CPU to execute, it's likely easy for you to read too. That means: no deep nesting, minimise branchiness (indirect calls are the worst), keep the code small and simple.
Unrelated to the content itself, am I the only one wondering if he has his t-shirt mirrored or if he's really skilled at writing right-to-left?
Content wise: his examples show such increases because they're extremely tight and CPU-bound loops. Not exactly surprising.
While there will be gains in in larger/more complex software by throwing away some maintainability practices (I don't like the term "clean code"), they will be dwarfed by the time actually spent on the operations themselves.
Just toss a 0.01ms I/O operation in those loops; it will throw the numbers off by a large margin, then one would just rather pick sanity over the speed gains without blinking.
That said, if a code path is hot and won't change anytime soon, by all means optimize away.
He created the t-shirts specially for StarCode Galaxy[1] which is a longer form class for C++ programming in the same veign as the current video, but with a much wider scope. (As far as I know SCG is not released yet).
As a small, amusing, s(n)ide note, Casey's rants about the slowness of the Windows terminal[2] that ended up in Microsoft releasing an improved version[3], were based him wanting to implement a TUI game as an exercise in SCG and the terminal being too slow.
>Just toss a 0.01ms I/O operation in those loops; it will throw the numbers off by a large margin, then one would just rather pick sanity over the speed gains without blinking.
I mean, yes, if you do something completely fucking idiotic like put an IO operation inside a tight calculation loop, then all your speed gains will vanish. But I don't see how that refutes anything.
I said I/O, but it could be memory access. All he does fits in registers and maybe cache. Also, 0.01ms was the normal RAM latency when I started using computers...
Do you remember the last time you had to do a tight calculation loop, and not only that, but one that significantly impacted the total runtime? Personally I do, it was roughly 15 years ago writing a raytracer.
I can imagine that happening in game/3D dev, DSP, emulators, ML, and maybe some other types of software, but even in those cases one already has dedicated libraries and hardware to extract the performance from where it can be extracted.
I mean, Python is slow as an old dog, yet it gets most of the ML fun.
There are lots of writings about technical and architectural reasons code in games performs better than code in GP applications, but people often forget the top level reason: they have clear performance targets right from the start and performance is the most obvious thing (right after "not crashing") that you see about how well a game works.
Everything follows from this. It's not that game devs are so much cleverer than other devs, they are just faced with first hand feedback of "does the game code hit the frametime budget" constantly and the whole dev org is committed to that.
1. Prefer polymorphism to “if/else” and “switch” - if anything, that makes code less readable, as it hides the dispatch targets. Switch/if is much more direct and explicit. And traditional OOP polymorphism like in C++ or Java makes the code extensible in one particular dimension (types) at the expense of making it non-extensible in another dimension (operations), so there is no net win or loss in that area as well. It is just a different tool, but not better/worse.
2. Code should not know about the internals of objects it’s working with – again, that depends. Hiding the internals behind an interface is good if the complexity of the interface is way lower than the complexity of the internals. In that case the abstraction reduces the cognitive load, because you don't have to learn the internal implementation. However, the total complexity of the system modelled like that is larger, and if you introduce too many indirection levels in too many places, or if the complexity of the interfaces/abstractions is not much smaller than the complexity they hide, then the project soon becomes an overengineered mess like FizzBuzz Enterprise.
3. Functions should be small – that's quite subjective, and also depends on the complexity of the functions. A flat (not nested) function can be large without causing issues. Also going another extreme is not good either – thousands of one-liners can be also extremely hard to read.
4. Functions should do one thing – "one thing" is not well defined; and functions have fractal nature - they appear do more things the more closely you inspect them. This rule can be used to justify splitting any function.
5. “DRY” - Don’t Repeat Yourself – this one is pretty good, as long as one doesn't do DRY by just matching accidentally similar code (e.g. in tests).
I think if someone takes any of these typical guidelines - clean, SOLID, REST etc - and mindlessly applies it they’re likely to end up with a few parts of their applications which look weird, or perform poorly or end up being worse somehow. This is because there are inevitably going to be situations where the guidelines don’t fit well - they’re not necessarily hard and fast rules after all.
Any time you have these common rules of thumb in any part of your life you need to evaluate whether or not they are appropriate. But just because they’re not infallibly universal, doesn’t mean they’re wrong, it just means life throws complex situations at us sometimes, and that we need to be pragmatic and flexible
> Prefer polymorphism to “if/else” and “switch” - if anything, that makes code less readable, as it hides the dispatch targets.
Worse, I think, is that it hides the condition, which can be arbitrarily far away in space and time.
Certainly dynamic dispatch can be very useful, and the abstraction can be clearer than the alternatives. A rule of thumb is to consider whether you'd do it if you were writing in C, using explicit tables of function pointers. If that would be clearer than conditional statements, do it.
(In case it isn't obvious, I'm talking about languages in the C++/Java vein here.)
In his small example he already added unforeseen couplings that could get out of hands if it was a big codebase. If you follow the principle "switch statements over [X]", try to add a new shape down the line and see how quickly you run into problems.
In the clean code version, your compiler will remind you to implement calculateArea, calculateNumberOfVertices, calculateWhatever, and so on and so forth.
With his version you have to add a new `case` to every switch statement and hope you didn't miss one with a default case, because the compiler won't catch it.
You don't need virtual functions and polymorphism to remove his switch statements. Just compose over inheritance.
> With his version you have to add a new `case` to every switch statement and hope you didn't miss one with a default case, because the compiler won't catch it.
The compiler not catching it is a limitation of the language he uses, not the limitation of the general concept of switch / pattern matching. Scala, Haskell, Rust do catch those.
> If you follow the principle "switch statements over [X]", try to add a new shape down the line and see how quickly you run into problems.
And who said you'd ever need to add a new shape? Maybe you will need to add a new operation? Try to add a new operation `calculateWhatever` and see how many places of the code you need to chnage instead of just adding one new function with a switch.
Often you really don't know in which direction the code will evolve. Most often you can't guess the coming change, so don't make the code more complex now in order to make it simpler in the future (which may never come).
>The compiler not catching it is a limitation of the language he uses, not the limitation of the general concept of switch / pattern matching. Scala, Haskell, Rust do catch those.
You're thinking of defaultless switch statements. Rust and Typescript catch those issues as long as you don't add a default. I'd already thought of it when I typed it
>And who said you'd ever need to add a new shape?
ô_o
>Maybe you will need to add a new operation? Try to add a new operation `calculateWhatever` and see how many places of the code you need to chnage instead of just adding one new function with a switch.
You do realize that your "one function with a switch" does have to cover every case exactly like the clean code version, it's not magic you're just fitting it all into one switch/case, and you'll probably end up extracting those case: blocks into separate functions anyway.
And on top of that you're not safe from a colleague adding a useless "default" case at the end ouf ot paranoia and making your compiler not catch a future problem when a new shape gets added.
You are not safe from a colleague adding a dozen classes to add two numbers if you are enabling the "clean" crowd either. When I was at uni I was obsessed with these clean ideas and how they would allow big teams to work. after years of working I have seen these ideas fail to actually make teams work in an harmonious manner, worst hit I have experienced against these ideologies was finally working in a place without them and seeing how a lot of the alleged "advantages" of "clean code" could be achieved by other simpler means and how much in the way these ideologies are of actually being productive.
Productivity is achieved in spite of clean, not thanks to it.
>You are not safe from a colleague adding a dozen classes to add two numbers
The difference is, adding "default" cases to complete switch statements happens all the time, you've probably witnessed it, I know I have, whereas colleagues adding a dozen classes to add two numbers doesn't.
You're also never safe from meteorites crashing your building, let's talk about real antipatterns
> You're thinking of defaultless switch statements. Rust and Typescript catch those issues as long as you don't add a default. I'd already thought of it when I typed it
But the same problem exists with interfaces and virtual dispatch! If you provide a default implementation at the interface / abstract class level, then the compiler won't tell you forgot to implement that method, because it would see the default one exists.
> he already added unforeseen couplings that could get out of hands if it was a big codebase
And one should be able to fix things that get out of hand when they start getting out of hand. Architecture should be based on current facts, not on our fantasies about the future.
For some reason we're always making this weird assumption that future engineers working on a problem are going to be less capable than us. So we decouple things in advance for them. Those future idiots won't know how to architect for scale, but we, today, with our limited domain expertise, know better.
The reality is that in most cases we are those future engineers. And, unsurprisingly, "future we" tend to know more, not less.
The word "clean" in reference to software is simply an indicator of "goodness" or "a pleasant aesthetic", as opposed to the opposite word "dirty", which we associate with undesireable features or poor health (that being another misnomer, that dirty things are unhealthy, or that clean things are healthy; neither are strictly true). "Clean" is not being used to describe a specific quality; instead it's merely "a feeling".
Rather than call code "clean" or "dirty", we should use a more specific and measurable descriptor that can actually be met, like "quality", "best practice", "code as documentation", "high abstraction", "low complexity", etc. You can tell when something meets that criteria. But what counts as "clean" to one person may not to another, and it doesn't actually mean the end result will be better.
"Clean" has already been abandoned when talking about other things, like STDs. "Clean" vs "Dirty" in that context implies a moral judgement on people who have STDs or don't, when in fact having an STD is often not a choice at all. By using more specific terms like "positive", or simply describing what specific STDs one has, the abstract moral judgement and unhelpful "feeling" is removed, and replaced with objective facts.
The original submitted link was a youtube video that's been deleted for some reason.
Probably a better link is the blog post because the author updated it with the new replacement video a few minutes ago as of this comment (around 09:12 UTC):
The example of using shape area seems like a poor choice.
First off, the number of problems where having an analytical measure of shape area is important is pretty small by itself. Second, if you do need to calculate area of arbitrary shapes, then limiting yourself to formulas of the type `width * height * constant` is just not going to cut it. And this is where the entire optimization exercise eventually leads: to build a table of precomputed areas for affinely transformed outlines.
Throw in an arbitrary polygon, and now it has to be O(n). Throw in a bezier outline and now you need to tesselate or integrate numerically.
What this article really shows is actually what I call the curse of computer graphics: if you limit your use cases to a very specific subset, you can get seemingly enormous performance gains from it. But just a single use case, not even that exotic, can wreck the entire effort and demand a much more complex solution which may perform 10x worse.
Example: you want to draw lines? Easy, two triangles! Unless you need corners to look good, with bevels or rounded joins, and every pixel to only be painted once.
Game devs like to pride themselves on their performance chops, but often this is a case of, if not the wrong abstraction, at least too bespoke an abstraction to allow future reuse irrespective of the use case.
This leads to a lot of dickswinging over code that is, after sufficient encounters with the real world, and sufficient iterations, horrible to maintain and use as a foundation.
So caveat emptor. Framing this as a case of clean vs messy code misses the reason people try to abstract in the first place. OO and classes have issues, but performance is not the most important one at all.
To summarize: if you make the problem complex enough, then the specific performance methods used in the article don’t work. But hey, why not take your complex example? If you apply a polymorphic approach to Bézier curves and polygons, then you still get a 1.5x slowdown compared to a switch statement. If you have any commonality between your implementations of area for them, it’s harder to find, which could be worth 2x or more. If there is a common algorithm for calculating all polygons and curves, wasting work for simple shapes, but vastly improving cache and predictor performance, then you could be leaving another 5x on the table. Orienting the code around operations instead of types is still a win for performance with similar cognitive load compared to type hierarchies.
You’re right that once you’ve done the 15x performance gain that Casey demonstrates, the code is pretty brittle and prone to maintenance problems if the requirements change a lot. But I think we can have our cake and eat it too by maintaining our code with the simple path available to fall back to if we get a complicated new requirement. Need to add in complex cases that weren’t thought of before? Add them to new switch cases that are slower, and then keep looking for more performant ways of calculating things of need be.
> You’re right that once you’ve done the 15x performance gain that Casey demonstrates, the code is pretty brittle and prone to maintenance problems if the requirements change a lot.
In some sense, it is a good thing. It creates natural backpressure to scope creep.
The "Clean Code" developer can add a new complex shape type to their program in 15 minutes; just subclass here, implement the methods, done, no changes to other code. No performance impact - the program remains as badly performant as it was before.
The Casye-style developer will take 15 minutes and come back to you with:
"Oh, we can make the calculations for this new shape approximate to, idk. +/- 50%, by adding another parameter to the common equation; this will, however, cause 1.1x performance drop across the board. We could make it accurate by special-casing, at the cost of 2-3x perf drop for the whole app. We can also spend a person-week looking into relevant branch of mathematics to see if there isn't a different equation that could handle the new shape accurately with no performance penalty."
"Or, you know, we could just not do it at all. Why exactly do we need to support this new shape? Is supporting it worth the performance drop?"
Whichever option the developer and their team chooses, their software will still remain an order of magnitude more performant than the "Clean Code" style. Which is to say, those devs are aware of the costs - the "Clean Code" style is so ridiculously wasteful, as to not even notice such costs in the first place.
Being able to handle arbitrarily polygons isn't insane complexity, it's actually the minimum viable complexity needed to handle any basic shape a user might want to use.
Limiting them to a handful of precomputed ones is the kind of limitation only a software developer can live with and love.
And once you can handle arbitrary polygons, writing and maintaining special case code has to be justified with evidence that your code is spending enough time on area calculations that it's worth optimizing and maintaining extra lines of code for.
If your area of expertise is putting out a general purpose shape area compute library, then maybe. If your purpose is elsewhere and this code is a tool to accomplish that --- not so much. Not everything needs to be fully general. Write the tool you need, whatever that ends up being.
I wish software engineering cared a lot more that we have no way of measuring how clean code is. Much less any study that measures the tradeoffs of clean code and other concerns, like a real engineering discipline.
The funny thing is that the things that are not possible to measure will be undone all the time because people can't agree on how it should be.
This means that there will be wasted time.
First we have to write the code this way.
Next year we have to write it in the other way.
Then it has to be done in the first way again.
It's so hard to prioritize things when you ask someone why something has to be done the way they say and they are not able to give a real answer.
I can do my job when option A means faster program and option B means more memory usage but I can't do my job when option A means faster program and option B is just the way it "should" be done.
There's a tradeoff. Engineering time is expensive. Machine time can be expensive too. We need to optimize these costs by making most code that's not performance relevant easy to read and then optimize performance critical code paths while hiding optimization complexity behind abstractions. Either extreme is not helpful as a blanket method.
I think he's really underplaying the main selling point of clean code - the objective of writing clear maintainable, extendable code.
His code was faster, but sometimes how it compares for adding new features or fixing bugs by people new to a code base is where you want to optimize.
Should performance be talked about more? Yes. Does this show valuable performance benifits? Also yes. Is performance where you want to start your focus? In my experience, often no.
I've made things faster by simplifying them down once I've found a solution. I've also made things slower in order to make them more extendable. If you treat clean code like a bible of unbreakable laws you're causing problems, if you treat performance as the be-all-end-all you're also causing problems, just in a different way.
It's given me something to think about, but I wish it was a more fair handed comparison showing the trade offs of each approach.
If someone is new to the codebase, would you rather they need to open a dozen files to see all of the different virtual functions that could occur at one call site, or open one file?
As tradeoffs go, I don't see opening a dozen files as a big hurdle. I'd take smaller single purpose files over one big file anyday.
Plus how often do you need to see all functions? If there's a problem in RectangleArea function, you go to that file, no need to look in CircleArea. If you're adding 'StarShapeArea' you only care that it matches the callers expectations, not worrying about the other function logic.
Honestly this itself is a massive oversimplification, and a bit of a strawman, and kinda indicates that you still aren’t seeing the nuance.
Yes, if something is understandable in a single file, that’s fine. But also appreciate that too many pieces in the same file can also be confusing and disorienting for people. You also have to consider all the cases in which someone would be opening that file.
You’re basically posing a situation which by its very description is an exception. If anyone is slavishly following these rules, they’re undoubtedly doing it wrong.
Knowing when to apply the patterns is the hard bit. And, in my experience working on codebases worthy of considering these things in the first place, there’s much more to consider than what you’re posing.
> There are far worse crimes than having code structure that spans several files.
Sure, but what's the advantage of having your code split over several files? Yes, you can jump between them with an IDE, but that's still a disruption, and it makes it harder to see common patterns that can help you simplify the code.
As demonstrated in the video, splitting up the switch into multiple classes only hurts readability.
You are only seeing a screenful of code at any time, no matter if they are in one file or multiple ones. I much prefer each “section” of code if you will having a specific name/location I can associate it with, than scrolling as a mad-man. Yeah, I know about markers/closable functions/opening multiple sections of the same file/etc, but at that point whose workflow is more complex?
> I see your point, but no one uses Windows notepad for coding anymore.
Did I miss an IDE that inlines all those things for you automatically? Because ones that only give you a "jump to definition", or maybe a one-at-a-time preview in a context popup, are not much better than Notepad++. You still don't get to see all the relevant things at the same time.
Most real world applications won’t have `print “red”` or the like under a specific implementation. Good luck inlining 6 1000 lines-implementations into a single switch statement.
The "project" vs. "external" vs. "system" library conceptual distinction exists for a reason. But of course you want presentation-level inlining to be semi-automated, much like autocomplete and jump to definition.
I didn't watch the video but I read the article and in the later part he shows how to extend every versions (polymorphic, enum and table-based) to "computing the sum of the corner-weighted areas".
This extension is easy enough with in the non-pessimistic case (table based).
> Main selling point of clean code - the objective of writing clear maintainable, extendable code
That's the alleged selling point, whether even that's true is arguable.
Since while performance can be clearly measured - whether the given "clear code" principles do in fact help with writing "maintainable and extendable code" is something that can be argued back and forth for a long time.
> So by violating the first rule of clean code — which is one of its central tenants — we are able to drop from 35 cycles per shape to 24 cycles per shape, impling that code following that rule number is 1.5x slower than code that doesn’t. To put that in in hardware terms, it would be like taking an iPhone 14 Pro Max and reducing it to an iPhone 11 Pro Max. It's three or four years of hardware evolution erased because somebody said to use polymorphism instead of switch statements.
The benchmark is a tight loop where the vtable lookup is a big chunk of the total computation. I don't think one can extrapolate this 1.5x improvement to real code. If anything, it represents an upper bound on the performance improvement you might expect to see.
I also didn't see anything about how the code was compiled. Various optimizations could affect performance in meaningful ways.
> The benchmark is a tight loop where the vtable lookup is a big chunk of the total computation. I don't think one can extrapolate this 1.5x improvement to real code.
No you can't. But other things get worse at a bigger scale. Not all programs make those virtual calls absolutely everywhere so the overhead scales with the program, but many don't pay attention to memory access pattern, and cause their instruction pointer to jump all over the place and trash their instruction cache. Mike Acton have once shown that merely reordering objects by types, while keeping those virtual calls, can help the instruction cache quite a bit just by making sure the same code was called several times in a raw.
This seems more like an argument against the object oriented model of C++ than anything else. Would have been more interesting if the performance was compared to languages like Rust.
In this particular case I find the code optimized for speed (the one using switch) to be also more readable and simpler than the code using virtual dispatch.
The problem with virtual calls in a big project is that there is no good way of knowing what is the target of the call, without some additional tooling like IDE. But in case of a switch/if, it is pretty obvious what the cases are.
> Sure, but what happens, once you want to start supporting other shapes other than basics? Because clean code assumes code will be changed/maintained.
You get to push back and ask, is it worth the developer time and predicted 1.5 - 5x perf drop across the board (depending on shape specifics)? In some cases, it might not be. In others, it might. But you get to ask the question. And more importantly, whatever the outcome, you're still left with software that's an order of magnitude faster than the "clean code" one.
Clean code "assumes code will be changed/maintained" in a maximally generic, unconstrained way. In a sense, it's the embodiment of "YAGNI" violation: it tries to make any possible change equally easy. At a huge cost to performance, and often readability. More performant code, written with the approach like Casey demonstrated, also assumes code will be changed/maintained - but constraints the directions of changes that are easy.
In the example from video, as long as you can fit your new shape to the same math as the other ones, the change is trivial and free. A more complex shape may force you to tweak the equation, taking little more time and incurring a performance penalty on all shapes. Even more complex shape may require you to rewrite the module, costing you a lot of time and possibly performance. But you can probably guess how likely the latter is going to be - you're not making an "abstract shape study tool", but rather a poly mesh renderer, or non-parametric CAD, or something else that's specific.
I do agree. I've looked at code bases where the switch was replaced with virtuals and the like. It's kind of hard to navigate around the code when that happens. I think I'd usually rather have a completely separate path through the code rather than switch or virtual, making the decision at the highest level possible, usually the top-level caller.
Indeed. If anything this demo shows how badly C++ polymorphism performs. It doesn't necessarly means that all OOP languages created equal. Although I have no data to prove anything, and frankly don't care b/c all these arguments about clean vs dirty code are meaningless in an absence of formally defined rules and metrics universally enforced by some authority that can revoke your sw dev license or something like that
> It doesn't necessarly means that all OOP languages created equal.
Exactly - unless you're trying very hard, you're unlikely to beat C++ polymorphism with your OOP code in a different language. Which makes Casey's argument that much stronger. C++ with its relatively unsophisticated OOP and minimal overhead on everything, is as fast as you're going to get, so it's good for showing just how slow that still is if you follow the Uncle Bob et al. Clean Code tradition.
> C++ with its relatively unsophisticated OOP and minimal overhead on everything, is as fast as you're going to get
No it isn't. If your C++ compiler isn't devirtualising at all (implied by the article) it'll get stomped on by anything doing inline caching [0] which will generate the switch-case code. The JVM does that for example.
"it is easier to make working code fast than to make fast code work"
That was from either the 1960s or the 1970s and I don't know that anything has changed in the human ability to read a mangled mess of someone's premature optimizations.
"Make it work. Make it work right. Make it work fast."
how to apply the observation above...
Most performance optimized code can be abstracted such that it reads cleanly, regardless of how low level the internals get. This is the entire premise of the Rust compiler/"zero cost abstractions". Or how numpy is vectorized under the hood. No need for the user to be exposed to this.
Writing "poor code" to make perf gains is largely unnecessary. Though there are certainly micro-optimizations that can be made by avoiding specific types of abstractions.
The lower level the code, the more variable naming/function encapsulation (which gets inlined), is needed for the code to read cleanly. Most scientific computing/algorithmic code is written very unreadably, needlessly.
Most C++-projects I dealt with are neither clean nor performant. I rather follow clean code to improve maintainability and get things done than to optimize for performance.
It’s also easier to find bottlenecks in a well readable and testable code base than in a premature-optimized one.
However it is true that the more abstractions and indirections used software gets slower.
Also these examples are too basic to make a real-world suggestion: never assume sonething is slow in a large project because of indirection or something…always get a profiler involved and run tests with time requirements to identify and fix slow running parts of a program.
Damn, people are going bananas left and right about this article. I don't think Casey is not targeting general programmer audience where sub millisecond performance does not matter as long as the user experience / business needs are satisfied but this is highly relevant in the world of HFT where you will try to optimise every instruction you execute.
He never mentioned that people should write horrible code for performance but more like pointing out. Personally and professionally, We stay away from virtual functions as long as possible due to unnecessary vTable lookup every time you want to call a method
I think the post invites a question to all the HN responders:
What would it take for the computer [software] you are using to run at its full blazing potential?
The answer is - if every single piece of software was written while already knowing the true requirements, the scope of its use and re-use, and knowing the future bugs and security flaws that would appear, then it could be written one time and be BLAZING FAST.
Many parts would still be written in a 'Clean code' style for the necessary extensibility and testability, etc. But many others would be small and near optimal.
THEN on top of that, if the author or an equivalent talent came along and rewrote or supervised the optimization of the regular software, similar to how the article does, your system would be HYPER INSANE BLAZING FAST.
If we are proponents of OOP or Clean code, we need to acknowledge that fact. (i.e. My code may not be important but it all contributes to slowing down the computing world).
And if we think the Author is preaching gospel here, you should also acknowledge that because the future is so often unknown when we write code we often have no choice but to fill it with Clean code that can be easily changed later, and sometimes even 'Shit code' that we thought would never be used by anyone.
Most programming tends to decompose doing an operation for each element in a sequence into implementing the operation for a single element and then doing that repeatedly in a loop.
This is obviously wrong for performance reasons, as operations tend to have high latency but multiple of them can run in parallel, so many optimizations are possible if you target bandwidth instead.
There are many languages (and libraries) that are array-based though, and which translate somewhat better to how number crunching can be done fast, while still offering pleasant high-level interfaces.
Apart from performance, I find that "non-clean" code (in terms of the article) is sometimes easier to understand, reason about and maintain. Context is important of cause...
In plain C we can make virtual function calls faster by forwarding the pointers into the object instance.
Say we have this:
int obj_api(object *o, char *arg)
{
return o->ops->api(o, arg);
}
that's representative of how C++ virtual functions are commonly implemented. It gets more hairy under multiple inheritance and such.
It requires several dependent pointer loads. We must access the object to retrieve its ops pointer (the vtable) and then access the vtable to get the pointer to the function, and finally branch there.
To call that function a little faster we can go to this:
int obj_api(object *o, char *arg)
{
return o->api(o, arg);
}
in other words, forward the api function pointer from the static table to the object instance. Ok, so now each time we construct a new object, we must initialize o->api. And the pointer takes up space in each instance. So there is a cost to it. But it blows away one dependent load. And the "clean" structure of the program has not changed; it has the same design with virtual functions and all.
We could do this for some select functions that could benefit from being dispatched a little faster.
I don't think there is a way in C++ to tell the compiler that we'd like a certain virtual function to be implemented faster, at the cost of taking up more space in the object instance and/or more time at object construction time.
Reading the comments it seems like a lot of people missed this part.
> We can still try to come up with rules of thumb that help keep code organized, easy to maintain, and easy to read. Those aren't bad goals! But these rules ain’t it. They need to stop being said unless they are accompanied by a big old asterisk that says, “and your code will get 15 times slower or more when you do them.”
He isn't against organised and maintainable code, he just thinks the current definition isn't worth the trade-off.
For all the creeping featuritis that C++ is acquiring like a dirty snowball, doesn't it have a solution for this yet?
virtual u32 CornerCount() = 0;
you should be able to declare a virtual data member
virtual u32 CornerCount; // default value zero
how this would be implemented is that it simply goes into the vtable. ptr->CornerCount retrieves the vtable from the object, and CornerCount is found at some offset in that table, just like a virtual function pointer would be.
There is no need to pull out a function pointer and jump to it.
In C I would do it like this
// Every shape has a pointer to its own type's static instance of this:
struct shape_ops {
unsigned (*area)(struct shape *);
unsigned corner_count;
}
// get_area looks like this:
unsigned shape_area(struct shape *s)
{
return s->ops->area(s);
}
// the corner count isn't calculated so it's just
unsigned shape_corner_count(struct shape *s)
{
return s->ops->corner_count;
}
Everyone can override corner_count with their value. What you can't do is implement a calculation which determines the corner count dynamically, but that can be a reasonable constraint.
There is only one vtable object per base class; all shapes share the same vtable pointer and your virtual CornerCount would be shared across all Shape instances. You are describing a potential implementation for class static variables.
What is the author suggesting? To write software using infinite loops changing global state? Makes sense for video games but not for the custom enterprise software where clean code practices are usually applied.
The enterprise code must be easy to change because it deals with the external data sources and devices, integration into human processes, and constantly changing end-user needs. Clean code practices allow that, it's not about CPU performance and memory optimizations at all.
>The enterprise code must be easy to change because it deals with the external data sources and devices, integration into human processes, and constantly changing end-user needs. Clean code practices allow that, it's not about CPU performance and memory optimizations at all.
There are no good metrics that measure how "clean code" (atleast the given rules) make the code easier or harder to change and maintain.
All the Java style "enterprise type code" from my experience is bloated, full of boilerplate getters and setters and all sorts of abstractions that often make things harder and not easier to understand/maintain, etc.
However CPU performance is easy to measure, and sticking to "clean code" rules as given in the video demonstrably sets you back a decade in hardware progress/makes the code run 10x slower.
> Clean code practices allow that
This is what you believe, not something you can actually measure as far as I know
This thread is surprisingly back today with millions of comments. I don't know if anyone has pointed out that the functions called... do nothing. Hence it is understandable that the performance profile is dominated by dynamic dispatch.
Also, his code must have been compiled with an old compiler or less than -O3 as the switch/table version of the code performs exactly the same with Clang and g++ when compiled with -O3.
In my experience of writing enterprise software, the main offender is N+1 query problem at an API boundary. I.e. when a module/package exposes only a method to process items one-by-one. In case you suddenly want to process 1000 items instead of 5, you'll end up having 1000 separate DB calls, HTTP calls etc. Same applies for gamedev where the author is coming from: a naive renderer could switch shaders individually for every object when you want to sort by shader and switch only a few times. When peformance suffers, you have to change 2+ modules (the client and the server) to support batch operations and a lot of programmers don't have time to do it or simply can't do it because they use a third-party module/service they can't change. Inside a module you can default to clean code or switch to an optimized version if need arises. In a small, well encapsulated/defined module you can write very simple code without overengineered abstractions because it covers a simple model which doesn't need too much abstraction. So my take is write small modules, design abstractions at the API boundary, and always expose batch operations.
Just in terms of readability and maintainability I find polymorphism to be significantly worse than switch-statements. It's hard to locate all the implementations of a particular function and read through them and edit them when they aren't in one place in a single switch statement. Higher performance is merely extra icing on the cake when using switch statements over polymorphism.
My biggest confusion with the "clean code" concept is, what does clean mean? Such a vague concept seems to invite arbitrary bikeshedding over how many lines a function should have, whether comments are good, etc.
In a kitchen, clean is a pretty objective concept: no dirt or grime, objects put away with similar objects. Not sure what it means in code, but it seems many people have strong, conflicting, subjective opinions about it. Doesn't seem like a good recipe for productivity or alignment.
I feel like it would be wiser to limit the concept of clean to the eradication of obviously "dirty" or "cluttered" things, like inconsistent style, or naming a module in a way that is misleading about its contents or functionality. Just as all different kinds of buildings can be clean, a code of "cleanliness" should not be so comprehensively prescriptive about architecture and organization. Use more appropriate names for those dimensions of code quality, rather than "clean" as the single stand-in for every good thing.
Already in his first example, where he says he doesn't use range-based for in order to help the compiler and get a charitable result, he doesn't get the point, I think. You write code in a certain way in order to be able to use abstractions like range-based for, or functional style. If you are hand-unrolling the loop, or using a switch statement instead of polymorphism, you loose the ability to use that abstraction.
Esentially the whole point of object orientation is to enable polymorphism without having big switch statements at each call site. (That, and encapsulation, and nice method call syntax.) When people dislike object orientation, it's often because they don't get or at least don't like polymorphism.
Most people, most of the time, don't have to think about stuff like cache coherency. It is way more important to think about algorithmic complexity, and correctness. And then, if you find your code is too slow, and after profiling, you can think about inlining stuff or using structs-of-arrays instead of arrays-of-structs and so on.
Perhaps some biases can be excused by committing to C++. Using dynamic dispatch in that language seems to be slow when it's pretty much always some vtable lookups under the hood, but it doesn't have to be that way. Implementations of other languages like Smalltalk or Common Lisp automatically apply (or have as options to specify) various strategies to make convenient abstractions a lot more performant. In Java Land the JVM can do many impressive things -- and interestingly, more numbers of smaller size methods helps, as it tends to "give up" if a function is a giant sprawling mess.
A fun story from https://snakeisland.com/aplhiperf.pdf on the utility of people using standard inner product / matrix multiplication operators, instead of hard-coding their own loops or whatever:
> In the late 1970’s, I was manager of the APL development department at I.P. Sharp Associates Limited. A number of users of our system were concerned about the performance of the ∨.∧ inner product on large Boolean arrays in graph computations. I realized that a permuted loop order would permit vectorization of the Boolean calculations, even on a non-vector machine. David Allen implemented the algorithm and obtained a thousand-fold speedup factor on the problem. This made all Boolean matrix products immediately practical in APL, and our user (and many others) went away very happy.
> What made things even better was that the work had benefit for all inner products, not just the Boolean ones. The standard +.× now ran 2.5—3 times faster than Fortran. The cost of inner products which required type conversion of the left argument ran considerably faster, because those elements were only fetched once, rather than N times. All array accesses were now stride one, which improved cache hit ratios, and so on. So, rather than merely speeding up one library subroutine, we sped up a whole family of hundreds of such routines (even those that had never been used yet!), with no more effort than would have been required for one.
Or, just look at SQL. I sure appreciate not having to write explicit loops querying the correct indexes every time I want to access some data.
David Farley’s new book is good. It advocates for the tenants of “clean code” (at least in all lowercase), but given his background I trust that he knows how to balance performance and code hygiene.
There are people that are wrong on both extremes, obviously. I’ve worked with one too many people that quite clearly have a deficient understanding of software patterns and try to pass it off as being contrarian speed freaks. Just as I’ve worked with architecture astronauts.
I’m particularly skeptical of YouTubers that fall so strongly on this side of the argument because there’s a glut of “educators” out there that haven’t done anything more than self-guided toy projects or work on startups whose codebase doesn’t need to last more than a few years. Not to say that this guy falls into those two buckets. I honestly don’t think I know him at all, and I’m bad with names. So I’m totally prepared for someone to come in and throw his credentials in my face. I can only have so much professional respect for someone that is this…dramatic about something though.
He's a gamedev by trade who also used to work at RAD tools back in the day. You can argue he doesn't know how to work in large teams because I don't think he has, but when it comes to understanding CPUs (or GPUs) as well as a dev can he's about as capable as anyone on that front.
He is very much aggressive to a degree I am not a fan of, but when it comes to calling out bad practices, I find him more right than wrong. But he is terrible at delivering his message in a way that won't ensure anyone who didn't agree with him already will get pissed off.
I don't think the tradeoff here is between clean-but-slow code and fast-but-dirty code. It looks more like extensible-but-slow vs fast-but-locked-in. This is pretty obvious - that's why you need the indirection! It's not to satisfy some arbitrary aesthetic principle of cleanliness, it's to make the concept of a shape extensible to anything the calling code wants.
Within one codebase the two behave the same because you can just go rewrite your functions, but if those functions are locked away in someone else's library the "dirty" code flat-out prevents you from ever using more shapes than the library author implemented. Want to add a rhombus? An arbitrary polygon? An ellipsoid? Something defined by Bezier curves for some reason? Well, you just can't; sorry champ.
It's an interesting tradeoff to consider, though. Perhaps we write code like library authors too often, or optimise for extensibility when it isn't needed.
Generally,it is good to give advice to write clean code. Undoubtedly, clean code causes less problems than unclean code (e.g., overengineered, overmodularized, or prematurely optimised).
Running speed tests on the much-cited mini-example with geometric shapes and their area is unfair and unrealistic, and it does not prove any point.
I think I can see where this is coming from: 'overly clean' OO style will split concerns into virtual one-liner functions without context distributed throughout the universe. For a simple problem, I prefer 'switch'. But that's not a good rule either. For anything extensible, like a GUI, 'switch' would be the wrong choice and virtual much better.
Programmers need to develop a feeling of appropriateness, and restructuring may be necesaary at times.
BTW, the manual loop unrolling in the article is broken and not advisable at all. I'd be angry in code review about such 'optimisations'.
If instead of measuring the benchmark of a specific optimized code against a non-optimized code we instead measure the time when the user gets their answer in many cases the non-optimized code will be several months faster. Why? Because it takes time to do optimizations and I can ship the non-optimized sooner.
Similarly we can then look at an iterated design and realize the optimized code is frequently going to be harder to refactor or understand (a precondition of refactoring). So now the time to when a customer gets their answer is delayed again.
Optimization step comes long after clean code. Clean code is most useful in the first 2 of the typical 3 steps[1]
1. Make it work (iterations of what working even means)
2. Make it right (iterations of what right even means)
3. Make it fast.
There is all sort of things: leaky abstractions, specializations for optimistic fast paths, dispatching on algorithms based on data distribution, runtime CPU dispatching, etc. Video: https://www.youtube.com/watch?v=ZOZQCQEtrz8
But: it has clean interfaces, virtual calls, factories... And, most importantly - a lot of code comments. And when you have a horrible piece of complexity, it can be isolated into a single file and will annoy you only when you need to edit that part of the project.
Disclaimer. I'm promoting ClickHouse because it deserves that.
I'm sympathetic to the rebellion against 'clean code'.
I think this obsession with clean code is a natural reaction to the overwhelming number of gotchas that seem to just come with the job.
It's a bit like when a parent watches their kid get hurt outside the house and in a complete overreaction locks the kid in the house for life.
Thing is, if I'm programming an airplane control system, I very much want that kind of pedantry I think. I really don't want to make a single mistake writing that kind of program. If I'm programming a video game, just let me write the code that I want. Nobody's going to die if it blows up in my face.
I'm not sure what should be the lesson from all this... Perhaps don't pick C++ unless you absolutely need to?
This is the first video I’ve seen by him. I’m by no means a fan of clean code. But I think he’s making a fool of himself here. Picking out 1 code example from the book doesn’t proof that much on its own. This stuff is so language, os, hardware and compiler specific anyway.
The iPhone comparisons are extremely cringe. Real application do so much more then this contrived example. Something that feels fast isn’t the same thing as something is fast.
Would I advise beginner programmer’s to read this book? Sure, let them think about ways to structure code.
If he just had concluded with, that it is important to optimize for the right thing that would be fine. But he seems more interested in picking a fight with clean code.
> But he seems more interested in picking a fight with clean code.
Or, more likely, a straw man.
"Clean" exists to provide some solutions to certain problems in TDD. Namely how to separate your logic so that units can be reasonably put under test without an exploding test surface and to address environments which are prohibitively recreated. If you don't practice TDD, "clean" isn't terribly relevant. As far as I am aware, it has always been understood that hard-to-test code has always had some potential to be more efficient, both computationally and with respect to sheer programmer output, but with the tradeoff that it is much harder to test.
It is useful to challenge existing ideas, but he didn't even try to broach the problem "clean" purports to solves. Quite bizarre.
Well in my experience it is generally true that when you start optimising things get less clean. Let me explain: most of the optimisation situations I had looked like this. Hey this query is pretty slow and costs us quite a bit. Oh look for this type of data it’s super easy we can just return this and then the other rest of the data we can now assume this. So you have broken a single clean and nice case into two slightly less clean but faster cases. And this breaking apart then continues becoming less and less clean because you rely on some obscure characteristic of that specific type of data.
For the record, runtime polymorphism is generally frowned upon in the most modern C++ practice.
The only difference between “modern, clean” C++ and the author’s switch is probably a concept that requires some type attributes.
The example is contrived, and the realization of “clean” code through runtime polymorphism is both dangerous and odd. The whole point of not using polymorphism is to catch runtime crashes at compile time, reduce overhead and improve readability. I know many people who wouldn’t use an object here anyway. Free functions would do nicely, and are infinitely compositional.
But I think there is a reason for the existence of "clean code" practices: it makes devs easier to replace. Plus it may create a market to try to optimize intrinsically slow programs!
It doesn't just make devs easier to replace. It makes it my job more pleasant (and that of my colleagues). But yes, you're right. It does also help onboard people.
Imagine working as a barista with a disorganised bar, a mat on the floor that keeps sliding and a corner is sticking up, and one bag of beans where half the side is decaf and the other is normal.
Now compare that to working in a more common sense coffee shop: everything is in its place, the mat isn't decrepit, and you have multiple bean bags.
In which one do you think it's easier to make coffee?
Huh? Coffee shops optimize for people not bumping into each other and having related items close together, and don't pretend to not know what kind of gear they have.. that's not a terrible analogy to the exact opposite argument.
I think the sentiment is that order and organisation is helpful in achieving goals and cultivating a good working environment as opposed to a big mess. Analogies, just like abstractions, are leaky.
Yeah, but this one leaks a smart-matter paint that self-assembles into a shape of text saying "the order and organization is not the goal, but a consequence of ruthlessly optimizing for performance above all".
It seems like you may be assuming that Casey is arguing against writing clear code, which he is not.
He is arguing that you should just write the simple thing and usually that is also the most clear, readable and "maintainable" code because it is easy to get an overview of.
So what he is arguing for does not fit your coffee shop example, because of course no one should write unreadable code.
The argument is that sometimes taking a step back from how you were taught to write clean code, could be simplified in a way that is _also_ performant by default.
This advice doesn’t really differ from the actual ‘source material’ though. It’s really arguing against the people that learned what “clean code” is from a blog post or a Tweet or (most likely of all) another YouTuber that tried to take a complex engineering topic that they don’t have the experience to understand, and shove it into a video-listicle full of DigitalOcean ads and forced facial expressions.
You see the same thing with microservices. Any of the reading material by the big / original proponents of microservices is actually quite good at giving you all the reasons why they probably aren’t for you. But that doesn’t stop the game of telephone that intercepts the message before it gets do most developers.
So I really just see this whole thing as someone saying “RTFM”, rather than it being any sort of derived nuanced take.
The sooner a professional software developer can get themselves off the treadmill of garbage trendy educational content, the better.
> people that learned what “clean code” is from a blog post or a Tweet or (most likely of all) another YouTuber
Or, you know, college. Though I can only speak of my local tech college, not full blown university. I don't bear them any ill will --- there's a hell of a lot to try to teach in two years --- but a lot of the things that were taught in my degree, were very dogmatic.
Personally I think we'd be in a better spot today if instead of class hierarchies and polymorphism the thing that would go mainstream was Entity-Component-System approach with composition instead of inheritance.
If null was one billion dollar mistake, clean code, SOLID and design patterns are 10 billion dollar mistakes. Think of all CPU cycles wasted across all the data centers and user's devices.
> We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%
The author of the post fails to articulate how we strike a healthy balance and instead comes up with contrived examples to prove points that only really apply to contrived examples.
You managed to misunderstand both Casey Muratori and Donald Knuth. You are not alone, the majority of the industry seems to have gotten it wrong.
Casey tells you that following "Clean Code" can give you a huge performance hit for no obvious benefit. And even if "Clean Code" were to be more maintainable (it's not; in my experience, it's actually worse for maintainability), you should still be extremely aware of the cost you're likely to pay down the track. It's not a contrived example, it's literally textbook "Clean Code".
I'll say it again: "Clean Code" gives you slower, less maintainable code, and you get nothing from it. Maybe you can afford it, maybe in your use case it's not a big deal, but you should be informed.
Knuth tells you to measure before optimizing, which Casey did. Knuth does NOT tell you "don't worry about performance, you'll optimize later". You quoted Knuth but stopped right before the best part:
> A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; BUT ONLY AFTER THAT CODE HAS BEEN IDENTIFIED [emphasis mine]. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.
To recap: "A good programmer will not be lulled into complacency by such reasoning" - in other words, just because 97% of the code may not need optimization does NOT mean you should not be thinking about performance.
Knuth's point is that when identifying hotspots, programmers were relying on intuition rather than measurement. That's what he meant by "premature optimization". Knuth did not mean (especially since it was the 70s) you should write "Clean Code" that you know has worse performance for little benefit.
And Knuth does not write "Clean Code", by the way.
> The author of the post fails to articulate how we strike a healthy balance
There is no healthy balance between a good idea and a bad idea. Just eliminate the bad idea.
There's a good Bjarne Stroustrup quote (mentioned in Clean Code) that is more useful here (as well as a bunch more on optimization / efficiency that are worth reading):
"I like my code to be elegant and efficient. The logic should be straightforward to make it hard for bugs to hide, the dependencies minimal to ease maintenance, error handling complete according to an articulated strategy, and performance close to optimal so as not to tempt people to make the code messy with unprincipled optimizations. Clean code does one thing well" [1]
Also worth noting what his idea of a "small efficiency" was; he goes on to note that, say, 12% efficiency, easily gained, is not small at all!
He never says to throw all ideas of performance out the window when writing your initial run of code. It's just not worth it to dig down into the weeds and micro-optimize everything ahead of time, is all.
But people just take this quote as liberty to completely ignore all notion of performance in their code. Maddening, and a total disservice to Knuth.
This problem feels like a no runtime type problem. Highly dynamic languages have tonnes of grim problems like this, that they have to deal with because there is no good type information anywhere so stats at runtime is how you optimise. Raku for example has a lot of specialising runtime optimisations where all the virtual function calls get specialised and become dynamic by exception when the VM executes code.
I patiently waited until the end of this video, hoping there'd be a punchline... but, turns out it's one of those C++ selfawarewolves kind of thing.
I mean, dude discovered C++ compiler sucks after over 40 years of trying to make it not suck so much, but ignores the fact that his tools are broken and proceeds to make completely unwarranted conclusions from that.
Needless to mention that software needs to be first and foremost correct. "Clean code" is about reducing the chance of a programmer of making certain kinds of mistakes. And even in the situation where the compiler sucks, it's still worth doing / paying the price in terms of speed, if you can get more confidence of your code doing what it's supposed to. Just like structured programming, "clean code" is an attempt to reduce complexity the author of the code has to deal with.
----
The proper conclusion that should've been the result of his experiments should've been: maybe something went wrong with the language and tools I'm using that even after a massive effort over several generation of programmers and mega-corporations backing that effort, the tools and the language still suck. So, the desirable properties of my programs (i.e. simplicity and ability to be extended) still come at a huge cost.
The problem with this is that in most real world scenarios it is much cheaper to add more hardware resources to slow performing apps than hiring, training and retaining programmers to learn, debug, enhance and maintain poorly written code, particularly when the useful life of many software solutions can last decades and hardware becomes cheaper every year.
Sometimes I see very good programmers writing functions that IMO are much much too long and have far too much internal state for one function. I believe there might be a correlation between these people having C++ backgrounds. Other than that I'm just mentioning it as an observation.
He’s basically showing data oriented design, where you, try to limit cpu cache misses by operating on the data.
This approach can be way faster, but is only relevant when you have a lot of entities you need to iterate over. If you have 3-100 objects it would of course still be faster but by negligible amount
The author of the video is apparently referring to Uncle Bob’s first book (Clean code, 2009), which essentially says that “Clean” code is “understandable” code: created with care, thinking about the next reader.
So yeah, the book then goes on for a painful 450 page ramble, opinions, and admittedly arbitrary rules.
But Martin was at least partially aware of this:
> “Clean code is not written by following a set of rules” — quote from the book!
So really, the person in the video failed to apply the Principle of Charity, which is fundamental in critical thinking.
They end up not addressing the interesting claim, and openly attacking a Straw Man.
As for the deeper points implied in the video, they seem –ironically– less fresh:
- Software is slow these days
- Performance matters
- The way you write code impacts performance
- Don't blindly follow rules and generic advice
Groundbreaking!
If anything, the video shows the failures of C++ as a language. Why aren't languages designed to promote maintainability without sacrificing performance? :Rust enters the room:
The more interesting claim that the video's author missed:
> “It is not enough for code to work.” ― quote from the book
Given that code in editor is for human consumption I wonder why can't it be restructured for compiler to make it fast. (Or why compiler can't make it fast). After all - you could leave annotations re, for example structs so compiler knows what will be their size, so can optimize for it
I take issue with the idea that maintainable code is about "making programmers' lives easier", rather than "making code that is correct and is easy to keep correct as it evolves". Correctness matters for the user - indeed, sometimes it is a matter of life and death.
> I take issue with the idea that maintainable code is about "making programmers' lives easier"
He's talking about "clean code", not maintainable code. The claim that "clean code" is more maintainable is an unproven assertion. Whenever I interact with a "clean code" codebase, it is worse in every way compared to the corresponding "non-clean" version, including in terms of correctness.
Maintainable code or performant code, yes. That's always a tradeoff. The most high performance code will be manually tuned assembly, but I don't see author writing in assembly, so he's already made some tradeoff against performance. It's all down to your priorities.
Lots of very correct things said there...except for this:
"The more you use the “clean” code methodology, the less a compiler is able to see what you're doing. Everything is in separate translation units, behind virtual function calls, etc. No matter how smart the compiler is, there’s very little it can do with that kind of code."
I suppose even that is true, but JIT compilation regularly walks right around those problems. Yes, your code is written to say virtual this, or override that...but the JIT don't care. Is it looking at a monomorphic call site? Or even if it's not, is it ok to think of it as monomorphic right now? Great -- inline away.
All that being said...I once got into a readability tiff over the use of a Java enum in a particularly performance sensitive chunk of code. I went with ints so I could be very, very explicit about exactly what I wanted, and the rather large performance gain...and lost. Yay!
Your mileage may vary, and your measurements may vary.
The article gives an advice from the past. Nowadays it is all about zero-cost abstractions and automatic optimization. These trends will only solidify in the future defining the new norm. And until that future fully arrives, optimize for your bottlenecks.
One can be tempted to like any assault on "Uncle Bob"'s insulting videos in the light of working on a codebase where every 2nd line forces you to jump somewhere else to understand what it does. That sort of thing generates a rebellious feeling.
OTOH the class design lets someone come and add their new shape without needing to change the original code - so it could be part of a library that can be extended and the individual is only concerned about the complexity of the piece they're adding rather than the whole thing.
That lets lots of people work on adding shapes simultaneously without having to work on the same source files.
If you don't need this then what would be the point of doing it? Only fear that you might need it later. That's the whole problem with designing things - you don't always know what future will require.
That's one side of the expression problem; the other is adding a new operation. With dynamic dispatch, you can add new shapes without altering the others, but if you want to add a new operation (e.g., perimeter()) then you have to modify the base class and all the children. With discriminated unions, adding a new shape requires modifying all the operations, but adding a new operation only requires the creation of a new function.
This is promoting early optimization, precisely to the people that needs to evade doing that. Added to the pile of #HorrificAdvice and #TerribleGeneralization.
This example exists in such a vacuum and is so distant from real software tasks that I just have to shake my head at the "clean code is undoing 12 years of hardware evolution"
It is, but that's the catch. It would be exceedingly hard to provide understandable examples of these things in code that is complex enough to need the patterns.
These days the cost of a programmer is probably a lot greater than the cost of execution, so some of these rules ("prefer polymorphism") are likely worth the tradeoff.
Cost of execution to who? If you don't care about the speed of your program when I execute it, we end up with electron based VPN GUIs that have menus that run at a few frames per second or electron based disk formatters that are a 400 MB download to ultimately run a command line process.
If you don't care about execution speed, I don't want to use it.
Yay, yet another contrived example to support somebody's position that a process which works fantastically in at least 90% of the time - and yes, I say 90% of the time (as a low-ball) because there _is no perfect process, framework, ideology, <insert x here>_. Everything has compromise. The compromise to make all code chase performance at the cost of maintainability is rubbish as a blanket choice, but may be necessary in certain niche situations.
Malicious compliance for C++ programmers. This is the person who thinks they're clever for breaking stuff because "You didn't say not to". Managing them out of your team is likely to be the biggest productivity boost you can achieve.
In the process of "improving" the performance of their arbitrary benchmark they make the system into an unmaintainable mess. They can persuade themselves it's still fine because this is only a toy example, but notice how e.g. squares grow a distinct height and width early in this work which could get out of sync even though that's not what a "square" is? What's that for? It made it easier to write their messy "more performant" code.
But they're not done, when they "imagine" that somehow the program now needs to add exactly a feature which they can implement easily with their spaghetti, they present it as "giving the benefit of the doubt" to call two virtual functions via multiple indirection but in fact they've made performance substantially worse compared to the single case that clean code would actually insist on here.
There are two options here, one is this person hasn't the faintest idea what they're doing, don't let them anywhere near anything performance sensitive, or - perhaps worse - they know exactly what they're doing and they intentionally made this worse, in which case that advice should be even stronger.
Since we're talking about clean code here, a more useful example would be what happens if I add two more shapes, let's say "Lozenge w/ adjustable curve radius" and "Hollow Box" ? Alas, the tables are now completely useless, so the "performant" code needs to be substantially rewritten, but the original Clean style suits such a change just fine, demonstrating why this style exists.
Most of us work in an environment where surprising - even astonishing - customer requirements are often discovered during development and maintenance. All those "Myths programmers believe about..." lists are going to hit you sooner or later. As a result it's very difficult to design software in a way that can accommodate new information rather than needing a rewrite, and yet since developing software is so expensive that's a necessary goal. Clean coding reduces the chance that when you say "Customer said this is exactly what they want, except they need a Lozenge" the engineers start weeping because they've never imagined the shape might be a lozenge and so they hard coded this "it's just a table" philosophy and now much of the software must be rewritten.
Ultimately, rather than "Write code in this style I like, I promise it will go fast" which is what you see here, and from numerous other practitioners in this space, focus more on data structures and algorithms. You can throw away a lot more than a factor of twenty performance from having code that ends up N^3 when it only needed to be N log N or that ends up cache thrashing when it needn't.
One good thing in this video: They do at least measure. Measure three times, mark twice, cut only once. The engineering effort to actually make the cut is considerable, don't waste that effort by guessing what needs changing, measure.
Fun fact. Software you routinely use, like Linux and Postgres, is written in the same "spaghetti" style. I guess the Clean Code people who implement CRUD features for a living are better programmers than the ones writing operating systems and databases.
> Most of us work in an environment where surprising - even astonishing - customer requirements are often discovered during development and maintenance.
Another fun fact. The author of the video works in an environment where rapid iteration is absolutely vital. I'd pay good money to see a TV show where his style of programming ("spaghetti", as you claim) run laps around your "Clean Code". Because it would. For example, he wrote a terminal emulator in a weekend to prove that Microsoft doesn't have a clue about how to write code (I assume they also have many Clean Code people, and that it would take them about 6 months to write a terminal emulator from scratch).
The reason why this video mentions performance is probably because 1) the author has a course on performance and 2) it's something you can objectively measure.
If it were me, I'd not even bring performance into discussion, I'd just say that "Clean Code" significantly hurts readability and editability (and thus maintenance). But then you jump in to say that the non-Clean Code version is "an unmaintainable mess", and then we go around in circles. Which is probably why performance is his main point.
> I'd pay good money to see a TV show where his style of programming ("spaghetti", as you claim) run laps around your "Clean Code".
You'd like to pay money to be assured that you're right? I prefer to have an informed opinion based on actually trying stuff out and measuring†, I also find that as a result I don't feel the need to pay for validation.
> For example, he wrote a terminal emulator in a weekend to prove that Microsoft doesn't have a clue about how to write code
So, your thesis is that writing a terminal emulator - software which is pretending to be hardware that existed 40+ years ago, shows that this person is great at handling surprising requirements changes during development ?
An insistence that the only thing we can measure is performance and specifically speed and therefore that's the only thing that matters is nonsense. It's just that this technique does so poorly when judged on maintainability that it's no contest.
Try it, first add the hollow box example shape, in the Clean Code this is very easy and we'll notice immediately it's de-coupled, colleagues working with abstract shapes don't care at all about our Hollow Box shape, it all just works with the abstract APIs.
The less-clean switch approach is a little bit hairier now, but it's very possible although we may notice now our objects are all bigger again, even though perhaps few are hollow boxes, they're all bigger as they all need to track the possible state of a hollow box. So that's actually a significant performance degradation for some applications in our supposedly "high performance" solution...
The table-driven approach needs a rewrite though, the F*W*H simplicity doesn't apply any more, there are a few "minimalist" approaches, all of them awful compromises waiting for the other shoe to drop - so perhaps a big bang rewrite is called for. Ouch.
Now, having learned from our hollow box experience, let's add Regular Star Polygons next. These are pretty interesting shapes - but we're shape classes so no reason we can't handle this, the stars have a defined area and a defined number of vertices ("corners"). But while the Clean Code here is very tractable, the dirtier approaches start to hurt pretty bad now.
Notice that under Clean Code the exact implementation of Regular Star Polygons doesn't affect anybody else, their code all still works regardless. For example maybe we should sub-class popular examples like the 5/2 and 6/2 rather than taking p and q parameters, doing this works fine under Clean Code, since it's nobody else's business.
† EtA: One of the most important innovations in years has been Godbolt.org, Matt Godbolt originally worked on this tool to examine exactly this sort of question, it's one that comes up early in the talk you liked - can we safely use actual C++ iterators? Wouldn't an old-fashioned for loop be faster in some cases? Matt's answer was "Yes", you can use iterators, the iterators produce exactly the same machine code and the tool he used to demonstrate that evolved into Compiler Explorer, the godbolt.org site today.
> You'd like to pay money to be assured that you're right?
I know I'm right, I'd pay money to see the embarrassment of the presumptuous Clean Code people who think that they can write maintainable code better than those who write software that matters (like Linux or Postgres, as mentioned before).
> So, your thesis is that writing a terminal emulator - software which is pretending to be hardware that existed 40+ years ago, shows that this person is great at handling surprising requirements changes during development ?
No. My thesis is that this guy can write code better than people who are supposed to be in the top 1% of the developers (well, it's Microsoft, not a web app sweatshop).
He works on games, where you not only have to iterate very quickly but you also may need to completely change direction halfway through the project. They can't have an "unmaintainable mess", otherwise they're not shipping the game, so your premise is wrong from the start. Also, games are much more complicated to program than the average Clean Code Crud app project that Uncle Bob bikesheds on.
> An insistence that the only thing we can measure is performance and specifically speed and therefore that's the only thing that matters is nonsense
Nobody said that, how are you coming up with this stuff?
The point Casey was making is that you're paying a significant performance penalty (speed, in this example) by doing Clean Code, which is true. You're denying yourself very basic performance techniques if you close your eyes and pretend to not see the internals of each shape.
> Try it, first add the hollow box example shape,
What if you don't have to? What if those are all the shapes you have to support? But there's ten billion shapes, you chose to do Clean Code and now you have slow code for zero benefit. Ouch.
On the other hand, if you want to add more shapes then the problem you're solving changed, and therefore you need to change the code (not really the tragedy you're making it out to be). I don't get this obsession, "code should change as little as possible"; it's actually a "careful what you wish for" moment because class hierarchies tend to become very rigid and difficult to change. Good luck making significant changes when your 100k+ loc program relies on a particular class hierarchy being in place.
> the iterators produce exactly the same machine code
That's great but it seems you're trying really hard to interpret Casey's video in bad faith. The way I understood it, he used an old fashioned for loop to avoid detracting from the main discussion, as not everyone knows what are the internals of STL iterators and how they translate to machine code.
> I know I'm right, I'd pay money to see the embarrassment of the presumptuous Clean Code people who think that they can write maintainable code better than those who write software that matters (like Linux or Postgres, as mentioned before).
You believe you're right, which of course you do, you almost can't help it.
Still, you keep mentioning Linux and it's worth a moment to consider that Linux actually does have the flavour of problem the Clean Code is modelling here, and it does indeed solve it the way Clean Code recommends and which Casey warns you will have egregiously bad performance. Several whole CPU cycles slower than Casey's spaghetti in fact.
Let's look more closely. In Casey's Clean example the Shape subclasses have to carry a table of functions to call to find out e.g. the area of that Shape, and then as we walk our array of Shapes we use these tables to call the appropriate area function. This costs us a dereference, which takes a few cycles.
With any luck, as this was described, your knowledge of how an OS works warmed up and you realised, "oh, that's, that's actually how the OS kernel works". Yup. Obviously Linux isn't written in C++ and so it has to actually hand-write the code to make tables of functions, so actually it's a bit clearer in the source, we can see that sure enough the implementation of CIFS for example and the implementation of XFS, and the implementation of FAT all just provide tables of functions.
So if Casey is right, shouldn't Linux be crushed by some 20x faster OS made by Casey or similarly minded games programmers (maybe Jonathan Blow) without these low performance tables ? Nope, there two good things to know here.
Firstly, this flexibility is immensely useful and it turns out most users can't live without it. A product which is 20x faster but can't do what you need is at best irrelevant, at worst a nuisance, a distraction. The demo shape project really needs to be able to be extended for arbitrary shapes.
Secondly, and this is often much more important in practice yet Casey just completely ignored it, this is a fixed cost overhead. Multiplying two numbers together is almost no work, so the overhead dwarfs the real work done, but in real software we are often doing a lot of work, yes even despite the "Single responsibility" rule and as a result the overhead is negligible in practice.
While we're in here it's important to notice that the overhead occurs because of the actual indirection, which was incurred in the C++ by the use of virtual function calls to several distinct types of Shape, and in our Linux kernel example by the use of several different filesystems via a table. You are not paying this overhead merely for the existence of functions to allow separation of concerns although that's what Casey implies.
> You're denying yourself very basic performance techniques if you close your eyes and pretend to not see the internals of each shape.
You're spending an unaffordable amount of your finite engineering resource on handling other people's problems in all of your code if you insist on peering inside everything as Casey does in this toy example.
The reminds me of the argument in Hare (another programming language from people who figure they're smarter than everyone else) that they shouldn't provide generics because you ought to build a custom data type each time you need something reflecting exactly what you needed each time. When I read that I decided to look briefly at how their compiler used a hashmap (IIRC) and of course it was buggy because it's all hand rolled and so it has a typical mistake you might make in your first attempt with a hashmap - as every hashmap in a Hare program is its own custom first attempt. I believe they subsequently fixed the bug after I reported it, so that's nice - until next time.
> He works on games, where you not only have to iterate very quickly but you also may need to completely change direction halfway through the project
Casey's only notable actual game project completed seems to have been The Witness, Jonathan Blow's second and more ambitious but arguably less successful game. Casey has worked in the games industry for a long time, but like Blow he's spent a lot more time telling other people he could do better than he has spent on actually demonstrating that.
In the time it took Jon and Casey to ship one game, John Carmack's id Software shipped Doom, some Doom sequels and Quake and some Quake sequels. Jon and Casey are not people to take your cues from if you want to have agile software development practices or ship products in a timely fashion.
> Linux actually does have the flavour of problem the Clean Code is modelling here, and it does indeed solve it the way Clean Code recommends
Getting a bit desperate here, eh? On one hand, having tables of function pointers does not introduce any code constraints, you can switch to switches or anything else at a moment's notice; class hierarchies are much more rigid (some random Torvalds quote, "all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app"). On the other hand, Clean Code is fundamentally tied to OOP and classes. Here, straight from the horse's mouth [1]:
"This expectation of polymorphism is the essence of OO programming. It is the reductionist definition; and it is inextricable from OO. OO without polymorphism is not OO.
C and Pascal programmers (and to some extend even Fortran, and Cobol programmers) have always created systems of encapsulated functions and data structures. It does not require an OOPL to create and use such encapsulated structures. Encapsulation, and even simple inheritance, is obvious and natural in such languages. (More natural in C and Pascal than the others.)
So the thing that truly differentiates OO programs from non-OO programs is polymorphism.
You might complain about this by saying that polymorphism can be achieved by using switch statements or long if/else chains within f. This is true, so I must add one more constraint to OO.
The mechanism of polymorphism must not create a source code dependency from the caller to the callee."
In short, C is not OO because it doesn't do polymorphism (as understood in the context of Java-like OO languages rather than a mystic "it kinda looks and does the same as OO, thefore C is OO"). Furthermore:
"FP and OO work nicely together. Both attributes are desirable as part of modern systems. A system that is built on both OO and FP principles will maximize flexibility, maintainability, testability, simplicity, and robustness. Excluding one in favor of the other can only weaken the structure of a system".
Which is to say, if you don't do OO then you're not doing Clean Code. On a side note, Robert Martin obviously thinks he could write a Linux that's more flexible, maintainable, testable, simple and robust, but he's leaving it as an exercise to the reader.
> You're spending an unaffordable amount of your finite engineering resource on handling other people's problems in all of your code if you insist on peering inside everything
It's not quite so dramatic, you don't need access to STL's internals, just the structures you're working with anyway. To simplify: if you have an algorithm that deals with shapes then don't abstract away the concrete types, don't try to impose a taxonomy, don't pretend there's a magic shape interface that generalizes everything, don't try to fit the square box in a round hole. Instead, allow the algorithm to deal with concrete types. This is in fact the most flexible approach - you won't find yourself having to rethink your class hierarchy when one of your classes doesn't neatly fit into the general picture. I've been in the situation where at the end of a project it becomes very obvious that the chosen class hierarchy is actually unsuitable for easily adding more features and improving performance, but by that point the effort to restructure the hierarchy is equivalent to a rewrite. But hey, we had Clean Code.
The key point is that Casey's approach allows you to easily optimize for performance if needed; Clean Code does not.
> Casey and Jonathan Blow are too slow to deliver products
Fine, take Mike Acton. Same ideas, except he had to ship games on demand. You won't catch him doing Clean Code.
Just keep in mind that actually keeping that 1.5x or even 10x performance boost you need to apply these consistently to a laaarge code base (performance critical apps tend to be this).
This means that in 2-3 months you end up with a codebase that is very difficult to work with, team members tripping over each other due to bad deps and abstractions and your iteration time start shooting up.
Doesn't seem like a realistic avenue to choose except maybe when coding to a final spec?
What is demonstrated here is that if you understand well the different parts of some code, you can recombine them in more efficient ways.
This is something very good to have in mind, but it must be applied strategically. Avoiding "clean code" everywhere won't always provide huge performances win and will surely hurt maintainability.
Such a misguided article.
What i constantly fight at work is poorly written unmaintainable code.
The code that needs to be fast is 1% or less.
Use three step rule when implementing something:
1. Make it work
2. Make it pretty
3. Make it fast (measure and optimize where it matters)
Don’t optimize early. 99% of code doesn’t have to be fast, it has to be right. And as code needs to be maintained, it also needs to be easy to read/change, so it stays right.
You shouldn’t do things that make your code utterly slow though.
"Game developer optimizes code for execution as opposed to readability that 'clean-code' people suggest".
There are few considerations:
- most code is not CPU bound so his claims that you are eroding progress because you are not optimizing for CPU efficiency is baseless
- writing readable code is more important than writing super optimal code (few exceptions: gaming is one)
- using enums vs OOP is not changing the readability at least to me
I think we can have fast and readable code without following the 'clean-code' principles and at the end it does not matter how much gain we have CPU cycle-wise.
“CLEAN” doesn’t care so much about readability, but rather testability. The video would have been far more compelling if Casey had spent more time showing how he would test his application without the test surface area blowing up exponentially as the feature space expands.
I think if you dig into why any of these tools perform poorly, you probably won’t find it’s because they were implemented using “Clean Code”, but rather that it’s down to a combination of many different things. I can’t say anything concrete because I don’t know which applications you are referring to.
But IMO framing it as clean vs performant is a mistake
>I don't find Casey's performant version less readable
It does create implicit coupling. If you try to add a new shape you will run into the problem.
In the clean code version, your compiler will remind you to implement calculateArea
With his version you have to add a new `case` to every switch statement and hope you didn't miss one with a default case, because the compiler won't catch this one.
people using clean code ideologies are being prematurely pessimistic and assuming they know much more about a problem than they actually do when they use these clean code techniques. "I don't know how many shapes I've been asked to do, so I'll assume the worst case scenario and make the code slower and harder than the simple naïve solution that would be hard to read(debatable) if we had one million shapes" is a terrible argument and it is why everything goes slow.
The correct way to deal with this, is refactoring to a more maintainable code once you know the amount of shapes will wildly change, As soon as we get too many shapes as the problem has changed. You can only pretend to know what is the best architecture for a problem when you have dealt with it several times.
Clean code apologists pretend their single time dealing with website backend is proof enough that clean code works and that it works for every problem and that it has to be the default approach and is the most readable for most problems. It is a total insanity for something that can't be measured with any tool.
Edit:
I fully understand that "premature optimization is wrong" but using these "guidelines" is premature optimization of scalability and maintainability. Somehow when the "premature optimization" is about things you people want that's somehow okay? pff
Also, I don't find clean code readable, it looks like complex, un-refactorable garbage to me 9 out of 10 times. No wonder why people are so fucking scared of rewriting a class and act is if it will take months to do so, this ideology makes impossible to actually play around with your code, you can't neither make it more readable or more performant, you are locked in with a sluggish collection of dozens of files even for the simplest of problems.
> people using clean code ideologies are being prematurely pessimistic and assuming they know much more about a problem than they actually do when they use these clean code techniques.
No. The techniques are there to let you change your codebase as you learn more about the problem.
there are two ways in which I can understand your message:
a) "these techniques allow for an easy to refactor codebase". I profusely disagree, its easier to refactor a function with some ugly switch than a behavior disseminated over dozens of files.
b) "we are actually careful and not draconian about these techniques only when appropriate", which I don't agree either, as in every experience I had interacting with people that believe in clean code in meatspace, was an obsession with having things done their way, assertions about "code smells" which were literally just not doing whatever they wanted.
maybe there's a c) I'm not seeing. But seeing this thread on its own its already high evidence that these two notions are clearly there in the CC community.
"we are actually performant" becomes "well, actually we are readable" becomes "well, actually we are testable" becomes "well, actually humm... just shut it and write on our code style". Some posts on this thread even talk about people not using these guidelines being evil and having to be ejected. Just imagine how people not suck up on this ideology look at it.
It doesn't have to be slow. There's a reason "Clean code" is being criticized everytime it's mentioned. It touts inheritance and polymorphism as a solution to everything like it's 2002. There's been enough "Inheritance considered harmful" articles to toss that aside
The take on switch statements is covered in "The Pragmatic programmer" as well, which coincidentally is much less criticized when it comes to books about clean code.
The way to fix the switch is getting rid of the class "shape" and making it an interface, then implementing the interface in each shape, as a non-virtual method. And then you don't let people inherit. They can compose instead.
Performance is unaffected, you get rid of the switch, the compiler catches your mistakes for you, and everyone's happy
Wtf? First time I hear about this one, and it sounds like a dumb dogma.
> It’s a base class for a shape with a few specific shapes derived from it: circle, triangle, rectangle, square. We then have a virtual function that computes the area.
Quite literally the first and simplest example for why you should prefer composition over inheritance[^1] (that and ducks and chickens).
Maintainability and performance are often at odds, but that doesn't mean you should throw out one for the other in every case, and I don't think that's what people like Robert C. Martin were ever intending with Clean Code.
It's like database denormalization, it may violate normalization principals but it when applied to a well designed database is a valid optimization technique when done with proper understanding of the implications of said optimizations.
More importantly though, we are willing to sacrifice raw performance for developer experience and higher maintainability because developer time is expensive, and most stakeholders would prefer that you can add feature xyz in a reasonable time, over feature xyz running marginally faster. If ease of development and maintenance weren't important, we'd just write everything in assembly and bypass all these abstractions altogether.
> So by violating the first rule of clean code — which is one of its central tenants — we are able to drop from 35 cycles per shape to 24 cycles per shape
Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
But most of us aren't doing that. Most developers are doing work where the biggest problem is adding the next umpteenth features that Product has planned (but hasn't told us about yet). Clean code optimizes for improving time-to-market for those features, and not for the CPU doing less work.