“Clean” code, horrible performance

mabbo · on Feb 28, 2023

I think the author is taking general advice and applying it to a niche situation.

> So by violating the first rule of clean code — which is one of its central tenants — we are able to drop from 35 cycles per shape to 24 cycles per shape

Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.

But most of us aren't doing that. Most developers are doing work where the biggest problem is adding the next umpteenth features that Product has planned (but hasn't told us about yet). Clean code optimizes for improving time-to-market for those features, and not for the CPU doing less work.

yashap · on Feb 28, 2023

100%, I’ve done tonnes of (backend) performance optimization, profiling, etc. on higher level applications, and the perf bottlenecks have never been any of the things discussed in this article. It’s normally things like:

- Slow DB queries

- Lack of concurrency/parallelism

- Lack of caching/memoization for some expensive thing that could be cached

- Excessive serialization/deserialization (things like ORMs that create massive in memory objects)

- GC tuning/not enough memory

- Programmer doing something dumb, like using an array when they should be using a set (and then doing a huge number of membership checks)

With that being said, I have worked on the odd performance optimization where we had to get quite low level. For example, when working on vehicle routing problems, they’re super computationally heavy, need to be optimized like crazy and the hot spots can indeed involve pretty low level optimizations. But it’s been rare in the work I’ve done.

This article is probably meaningful for people who work on databases, games, OSes, etc., but for most devs/apps these tips will yield zero noticeable performance improvements. Just write code in a way you find clean/maintainable/readable, and when you have perf issues, profile them and ship the appropriate fix.

p1necone · on Feb 28, 2023

Casey Muratori knows a lot about optimizing performance in game engines. He then assumes that all other software must be slow because of the exact same problems.

I think the core problem here is that he assumes that everything is inside a tight loop, because in a game engine that's rendering 60+ times a second (and probably running physics etc at a higher rate than that) that's almost always true.

Also the fact that his example of what "everyone" supposedly calls "clean code" looks like some contrived textbook example from 20 years ago strains his credibility.

Edit: come to think of it, the only person I know of who actually uses the phrase "clean code" as if it's some kind of concrete thing with actual rules is Uncle Bob. Is Casey assuming the entire commercial software industry === Uncle Bob? It's like he talked to one enterprise java dev like 10 years ago and based his opinion of the entire industry on them.

teki_one · on Feb 28, 2023

The thing that sets him off is that he is using a computer with enormous computing power and everything is slow.

He does have a narrow view, but it does not make his claims invalid.

I liked that his POC terminal made in anger made the Windows Terminal faster. But even in that context it was clear that by making some tradeoffs - which the Windows Terminal team can not make (99.99% of users do not run into the issue, but Windows has to support everything) - it could be even a lot faster.

So we live in a world where we cater for the many 1% use cases, which do not overlap, but slows down everyone.

Many gamedevs do their own tools, because they are fed up how slow iteration is. The same thing is happening at bigger companies, at some point productivity start to matter and off the shelf solutions start to fail.

sangnoir · on March 1, 2023

> The same thing is happening at bigger companies, at some point productivity start to matter and off the shelf solutions start to fail

This is perhaps the third time I've posted this on HN, but what you describe is the circle of life for widely-used software projects. Large tech companies are not immune to it, resulting in frequent component rewrites, deprecations and almost-drop-in replacements that shuffle complexity up or down the stack.

Step 1: Developer is fed up by how slow/bloated current incumbent is, so they write a fast, lean and mean project that solves their problems

Step 2: The project becomes popular on its merits, rakes in stars on Github as people discover how awesome it is

Step 3: Users start discovering limitations for their use cases, issues and pull requests pour in

Step 4: Thousands of PRs later, the project is usable by most people and has "won". It is now the incumbent, but no longer is as fast as it once was, but it also ships functionality catering to many niche needs

Step 5: Go to step 1

tysam_and · on March 1, 2023

I've started a project that has the potential to go down this path but have a very strict 'implement your own diffs if you want them' for this particular project.

Like, feel free to fork away if you like. The core repository needs to be simple and stay true to its goals, and when it updates everyone downstream can update if they want to do that to themselves. But for what it is and does, maybe the project as it is is good enough.

I start feeling almost physically sick when I see the potential for bloat to creep into the software I write. This makes working with scaled software development with others particularly hard, however.

dragonelite · on March 1, 2023

This is so true especially when it comes to frontend development also for backend framework but to a lesser extend.

But i also think that the product, library or framework owner should really box in its project and reject wild growth of features and prevent generalisation of the usage.

praptak · on March 1, 2023

See Phoenix browser, now Mozilla Firefox.

Ultimatt · on March 1, 2023

Oh man Phoenix those were the days, 2003. I imagine a lot of HN readers here were only just born.

SiempreViernes · on March 1, 2023

Don't forget Firebird!

Archelaos · on March 1, 2023

Lynx on the C64 -- Is more nostalgia possible?

whstl · on March 1, 2023

"But even in that context it was clear that by making some tradeoffs"

It was literally a weekend POC, and Casey Muratori even went beyond the POC part and fixed some emoji/foreign language bugs that were present in the Terminal.

Also, his intent was not to replace the Terminal. His intent was to demonstrate that it was possible to do the optimization in the way he suggested. Originally a Microsoft PM dismissed his suggestions and claimed it would be a "doctoral thesis project" or something.

All this "yeah it's a narrow view" is just moving the goalposts more and more. Not only he has to do a "doctoral thesis project" in a few days, he also has to completely replace a tool that's already written, bells and all? Where does it stop?

nerdponx · on March 1, 2023

But how much of that slowness is due to code that values "cleanliness" excessively? I bet that if you look at the source of nearly any application on your PC, it will be very much not clean on average.

ummonk · on March 1, 2023

I think it would certainly value the kind of "clean" design patterns (or anti-patterns as I consider most of them) that object-oriented programming evangelists espouse.

nerdponx · on March 1, 2023

Fine, but is that likely to be the cause of these applications generally being slow?

knighthack · on March 1, 2023

It wasn't about the tradeoffs that the Windows Terminal team "can not make" - it was about alternative optimizations and performance concerns that they arrogantly refused to consider as being possible, and which ought to have been considered if they were being properly competent.

Engineering tradeoffs are real. But hiding behind them every time when it can be pointed out that they don't actually apply - and when demonstrated with concrete evidence - is another thing altogether.

kadoban · on March 1, 2023

> The thing that sets him off is that he is using a computer with enormous computing power and everything is slow.

If that's his complaint, then "clean code" isn't the problem. The problem is capitalism and/or human nature.

Once something performs acceptably well, ie good enough to sell it, performance isn't going to get any better. Flashy stuff and features get you money, going from 400ms to 100ms gets you...nothing.

sgarland · on March 1, 2023

> going from 400ms to 100ms gets you...nothing.

According to Amazon [0] that'd be a 3% gain in sales (assuming the inverse holds true as getting slower, anyway).

[0] https://www.gigaspaces.com/blog/amazon-found-every-100ms-of-...

kadoban · on March 1, 2023

That's if it's in your sales flow, not if it's in your software.

mtrower · on March 1, 2023

Quite often the issue isn't 400ms vs 100ms, it's literally seconds vs single-digit ms.

> The problem is capitalism and/or human nature.

Fundamentally, yes.

raspberry1337 · on March 1, 2023

[flagged]

rando14775 · on March 1, 2023

[flagged]

kunley · on March 1, 2023

The industralization of the Soviet Union was done at the cost of the massive hunger, the murders of Great Purge and making disturbance to the neighbor contries (I am talking about pre 1939 now, WW2 soviet atrocities are not even scratched here), resulting in millions of deaths, and bringing misery to generations, in some areas until today.

dsego · on March 1, 2023

I want my car to have a subscription to nice heated seats when I am stuck in traffic after working unpaid overtime.

raspberry1337 · on March 1, 2023

>Cars weren't as big a priority for the communists. They were right too, cars and car infrastructure are super inefficient.

How to say that you're a city kid, without saying you are a city kid.

CorrectHorseBat · on March 1, 2023

Well yes, rural infrastructure is super inefficient compared to urban infrastructure.

aforwardslash · on March 1, 2023

> He does have a narrow view, but it does not make his claims invalid.

I would tend to disagree on this, specially when claims come from the gamedev world. Games are presented as finished pieces (even when they aren't), and not just a release milestone. Ideally, a game is a one-off effort where you write a piece of code and if you're lucky, you won't have to touch it again. So, doing one-off optimizations instead of focusing on milestones and long-term maintainability of the code is not only a possibility, but actively encouraged. That's why until rather recently (20 or so years), assembly optimization for critical execution paths, if not for most of the product.

Most of the rest of the software doesn't work like that. You often implement something that will be maintained, modified, extended and reiterated on for several years, not by you, but by several other teams with totally different experience and backgrounds. Or decades. Doing some fancy trick to skip a cleaner, extensible, maintainable design because you shaved off a couple of cycles on it is literally burning your employer's money and potentially causing huge issues in terms of maintainability, as many programs don't actually rely on a happy path like games do.

The main reason modern systems are slow isn't (just) because programmers are lazy - Its because most software - unlike games - have compatibility and maintainability requirements, and more often than not, a huge legacy support. And also, in these systems, most of development time is actually spent maintaining and extending existing code, not writing new one.

The author's assertion is fundamentally wrong, because software engineering is quite more than performance - even when it matters. Flashback to the beginning of the 90's, and "every game" used bresenham's algorithm to skip usage of the (slow or non-existent) div instruction. In some cases, a couple of bit wise shifts would also eliminate mul operations. These implementations were in some cases 2-4x faster than the classical counterparts, on a 12-40Mhz machine. Two cpu generations later, the Pentium comes out, and both mul and div take 1 clock cycle. The fancy pants implementation is now 3-5x slower at the same speed. Except now the cpu clock is 4x faster and shoveling around registers may actually impede parallel execution of code. All of this in a 5-year window. I envy the relatively stable instruction set of the last decade, where everything is sort-of predictable and assertions of speed can be made on code with a relatively high degree of confidence, but the reality is, silicon is cheap, and for most applications, performance is gained not by throwing away what makes some huge applications barely maintainable, but by deploying hardware. New, fancy, faster, cheaper and more economical hardware. Choosing a single metric (performance) and an instance in time to bitch about something is actually a disservice to the community at large.

sgarland · on March 1, 2023

> Two cpu generations later, the Pentium comes out, and both mul and div take 1 clock cycle.

Where are you getting this information? Agner[0] lists DIV as taking 17 cycles at best (8-bit operand already in a register) on the P5, and MUL as taking 11 cycles. Even Tiger Lake takes 6 cycles for DIV.

There are ways [1] to beat that, but I don't think you can get it down to a single cycle.

[0]: https://www.agner.org/optimize/instruction_tables.pdf p.162

[1]: https://lemire.me/blog/2019/02/08/faster-remainders-when-the...

aforwardslash · on March 1, 2023

You are completely right, I just had a major brain fart. I probably mixed up some things (or had some bad/incomplete source at the time). More than two decades have passed, so its probably me with wires crossed.

AlchemistCamp · on March 1, 2023

> “Ideally, a game is a one-off effort where you write a piece of code and if you're lucky, you won't have to touch it again.”

Ideally a game pulls in over a billion dollars per year, every year for over a decade. Think World of Warcraft or Fortnite, not Flappy Bird.

aforwardslash · on March 1, 2023

And the amount of money changes the fact that the core engines are written as a one-off effort how, exactly? Updates to the scripting engine to fix play-ability issues and content updates aren't really heavy software refactoring. Sure, there are usually some actual code bugfixes on the initial releases - and more often than not - related to someone implement some really clever trick that raises an exception on some cpu. Its not like they are incrementally rewriting and extending the internal engine for a decade, as it happens eg. with a browser.

teworks · on March 2, 2023

If you think that, you're not familiar enough with modern games as a service. Fortnite lives on the latest version of Unreal Engine, and Unreal Engine changed a lot from the initial Fortnite development until now with many new features, rewritten parts, and major refactoring of other parts. It's huge and constantly evolving, so it is similar to, i.e., browsers.

plorkyeran · on March 1, 2023

"Games are presented as finished pieces" is an idea that's at least a decade out of date. The industry has gone very hard on the idea of Games as a Service and it's now normal for AAA games to receive years of content updates.

aforwardslash · on March 1, 2023

From a software perspective, they are. Most games don't change requirements (or base code) during their maintenance releases, as these releases may fix some code bugs, more often than not provide only incremental updates on content. Compare that with eg. intermediate releases of software like OpenOffice.

jiggawatts · on Feb 28, 2023

Casey makes the point that you don't have to hand-tune assembly code, but instead just write the simpler code. It's easier to write, easier to read, and runs faster too!

If there's something wrong with that advice, I can't imagine what it is...

kyralis · on Feb 28, 2023

"Easier to write, easier to read" is the part that's wrong with that advice.

It absolutely is, on toy problems like the one described in the article.

It very frequently is not when embedded in much larger domains as part of large projects maintained over years by teams.

whstl · on March 1, 2023

As a counterpoint: "Clean Code", at least the variant from the book, is very frequently also extremely difficult to write or to read in larger codebases too.

The claim that "Clean Code" scales better or allows for more maintainable software hasn't been proven by anyone, and everyone with enough experience has worked with several counter examples.

The problem of code maintainability is not solved by this coding philosophy.

singron · on March 1, 2023

Sometimes people forget that "Clean Code" is a book, and it's not exactly stellar. This is a pretty thorough tear-down: https://qntm.org/clean

I'm totally onboard with prioritizing readability over performance for most code, but the style in the book has a lot more tradeoffs than it discloses, and you often don't really appreciate that until you are trying to debug if/how A transitively calls Z in a 10 million line codebase.

It's really hard to have a constructive conversation about this though since it's so subjective, any example is too trivial, and any real system is too large.

DeathArrow · on March 1, 2023

Do you have any example of an open source project written in the simple style suggested by Casey Muratory that is very difficult to be maintained and you think would benefit from using "clean code", "SOLID", design patterns and abstraction on top of abstractions?

If you can't point to an actual example, I don't think you have a solid case.

LorenPechtel · on March 1, 2023

This. Clean code isn't about writing stuff like this. For what he's talking about the overhead of polymorphism is a major part of the total cost and his case is simple enough that there's little value.

However, the bigger your task gets the more value there is to polymorphism and in general the smaller the percent of total time goes to the polymorphism overhead.

And note that his attack is only on polymorphism, not the other aspects of clean code. I strongly suspect the compiler optimizes away much of the clean stuff I do but I have never checked. I also find profiling easier on cleaner code, it makes it very obvious where the time sink must be and thus what warrants expending effort to improve. Profiling almost always shows the vast majority of time going into the unavoidable (say, disk reads) and a small number of other routines. Spend your optimization effort on the spots that need it because 99+% of your code doesn't run often enough for it to matter.

jrumbut · on March 1, 2023

And when you combine inadequate abstractions with programmers who aren't the kind of geniuses brought in to optimize game engines you get very difficult to fix performance problems.

One of the nice things about some of the clean code concepts he uses is that (as he shows) you can tactically step back from them in key, performance critical areas and reap these wins.

If you stay too low level you get lots of tangled spaghetti code with major performance problems and no obvious way forward besides "make it better."

whstl · on March 1, 2023

I find it a bit disingenuous to call what Casey Muratori is doing "staying in the low level".

Using procedures/functions is not exactly "low level". Using switch is not low level. Lookup tables are something you have to do in high level code all the time.

Sure he could have used much better variable naming (CTable?) and probably documentation, but code-wise there's nothing that screams low level there.

jrumbut · on March 1, 2023

I'm not sure how it's disingenuous, I sincerely believe what I said and I'm not trying to fool anyone.

I would consider most of his replacements lower level than typical the clean code practices he critiques (especially the ones like iterators that he mentions but avoids in order to steel man the clean code side a little bit), not the lowest level possible. They take into account how the machine actually works and avoid additional indirection which is why they perform better.

ablob · on March 1, 2023

His code does translate relatively straightforward into haskell. Do you think haskell is a low-level language, too?

Take Listing 27, getAreaUnion for example:

  f32 const CTable[Shape_Count] = {1.0f, 1.0f, 0.5f, Pi32};
  f32 GetAreaUnion(shape_union Shape)
  {
      f32 Result = CTable[Shape.Type]*Shape.Width*Shape.Height;
      return Result;
  }

Is represented quite straightforwardly:

  {-# LANGUAGE OverloadedRecordDot #-}
  
  data Shape
    = Square    { width :: Float, height :: Float }
    | Rectangle { width :: Float, height :: Float }
    | Triangle  { width :: Float, height :: Float }
    | Circle    { width :: Float, height :: Float }
    
  cTable :: Shape -> Float
  cTable shape = case shape of -- The "lookup table" or "array"
    Square    {} -> 1
    Rectangle {} -> 1
    Triangle  {} -> 0.5
    Circle    {} -> pi
  
  getAreaUnion :: Shape -> Float
  getAreaUnion shape = cTable(shape) * shape.width * shape.height

Although it is typically easier to abstract in a "high level" language, abstraction does not require it. This whole debate is rooted on false assumptions and the need to take a side, imo. Casey has a point, it is just ignored in a typical hand-wavery fashion. "The toy example doesn't scale" is a poor argument, especially when what we can observe is slow software.

The stuff proposed in the post is not rocket science, it is a very straightforward implementation of tagged unions. Instead of fetching a vtable and jumping to a value there, he proposes to branch on the tag. This is essentially dynamic dispatch on a known set of types.

Additionally, he shows that this can result in speedups greater than a factor of 1. Any program that wants low latency or high throughput can profit from this observation.

This way of programming is by no means the one to rule them all. It has different advantages and drawbacks; none of which have anything to do with the percieved intelligence of the programmer or later consumers, for that matter.

An objective disadvantage of this style is, that the program can't interface with code, that hasn't been written yet, as a caller. Another disadvantage is that the size of the tagged union is defined by its largest "subclass".

In the end, what he has shown is that speed is often a compromise made unnecessarily. This doesn't really have to do with clean code anymore, as I can see how a compiler could implement what he is angry about with virtual functions in every situation where his style is applicable.

Casey has had a similar thing about the windows-terminal and somewhere in his videos a different, yet arguably worse, problem comes to mind: a lot of libraries do not care about performance enough. If you write a program and care, you may run into the problem that the library you use is your bottleneck. If this library is hard to replace (imagine needing a rocket-scientist), then you are done for. In that specific case it was DirectWrite and some other Windows-API that were slow. So if, for one reason or another, the windows team was required to use both, they'd have a hard limit on how fast they could go, just due to that. There is no "being smart" or "requiring a genious" involved in the forced/strongly recommended library here.

dsego · on March 1, 2023

It could in fact sometimes be easier to refactor low level code, than to dig yourself out of bad leaky abstractions.

https://caseymuratori.com/blog_0015

Aeolun · on Feb 28, 2023

> If there's something wrong with that advice, I can't imagine what it is...

It will start getting really annoying when you try to add shape ‘hexagon’ and need to figure out all the places where a shape can potentially be used, just so you can update the switch statements.

andyferris · on Feb 28, 2023

Many languages provide unions or sum types along with exhaustiveness checking to make this very easy (frequently not OO-inheretence based languages though).

crabbone · on March 1, 2023

Why would you go with "many languages" into a thread showcasing how C++ sucks? I'm pretty sure that had the author of the video done all the same manipulations in Python, the speed difference would've been negligible.

The author of the video discovered that C++ compiler is dumb when it comes to optimizing virtual method calls (that instead of bare virtual method calls he had to help the compiler to guess the right conditions where these virtual calls could be replaced with guessed static calls). Essentially, all that his video is saying is: "virtual calls bad if-else good". Which is like what every C++ game-dev thinks after few years on the job. Which is amusing in how short-sighted it is, and sometimes even more amusing to discover the "solutions" created by such C++ game-devs that are aimed at replacing C++ objects, but do it in a way that's even worse than C++ original design (who would've thought that to be possible!?)

Ygg2 · on March 1, 2023

What happens if library user wants to extend functionality? They can't inject their code into the library.

Olreich · on March 1, 2023

If this is going to be a library where that’s a desirable feature, architect it for that feature. In the example, one easy way would have the coefficient table be expandable/replaceable. If you really need to run arbitrary user code, then write an interface that the user will conform to and call their code. You don’t even need OOP support to do that easily, just typed function pointers.

piaste · on March 1, 2023

Yeah, it's very awkward. Your best option is to leave a 'hole' case where someone may provide a 'data type' set of functions satisfying an interface, and the library author simply calls them. Effectively you're adding an OO escape clause, but it's ugly and will break user code when you add more functions and grow the interface.

Conversely, in a codebase organized by objects it's not clean to add an extra method to the base class and each subclass. You have to write an external function and switch over every known subclass inside it, which is also very ugly and will also break when you add more subclasses.

The two designs are actually the duals of each other. Someone compared it to rows vs columns and it's a great comparison.

In OO, the methods are columns and each new row is a new subclass implementing them.

In FP, the types are the columns and each new row is a function that switches over the possible types.

sarchertech · on March 1, 2023

Bob Nystrom has a good article on this (the expression problems it’s called)

https://journal.stuffwithstuff.com/2010/10/01/solving-the-ex...

He also discusses it in his book Crafting Interpreters.

GrumpySloth · on Feb 28, 2023

Even in C compiler will emit warnings for unhandled cases in switch statements as long as you don’t provide a default case (as you shouldn’t).

e28eta · on March 1, 2023

Depends on the type of code you’re writing. If your `switch` is tightly coupled to the code that defines the cases and they’ll definitely be changed in lockstep, a default is more likely to be harmful.

If your cases are defined externally, and you need to be forwards compatible, omitting a default is wrong.

The Swift language specifically added `@unknown default` for switching over enums.

Aeolun · on March 1, 2023

Depends on your language I suppose. I haven’t worked with a ton of compiled languages.

But we can just re-up the problem by adding 100 different shapes instead of the one. Now you have switch statements with 104 cases each spread through your codebase.

GrumpySloth · on March 1, 2023

I prefer to have those 104 cases all in one place (as is the case, when it is a switch statement) rather than each in a separate file, that I need to jump around between now (as is the case with polymorphism). This situation is a bit analogous to organising things column-wise vs row-wise. And in practice I find that I need to jump around a lot less with code that uses switches than with code that uses polymorphism. Tangentially, the latter is also more prone to turning into spaghetti, as the whole is obscured by indirection levels between the parts, but you don't see the spaghetti until you try to step through the code, when debugging an issue or just trying to familiarise yourself with a new codebase.

camgunz · on March 1, 2023

Conversely it's really annoying to add a new method to each shape--you've got to open them all up and add shape-specific code for them with a bunch of boilerplate. With switch statements you just add one more function.

This is the "expression problem": https://en.m.wikipedia.org/wiki/Expression_problem

jbverschoor · on Feb 28, 2023

switch isn't 'invented' for enums. switch is a low-level construct which mimics several goto's / jumps. It's just as bad, except for the case where you have either: a) multiple behaviors for the same value, b) need to pass through (not break). b) is the nr 1 reason I hate switches, and my nr 2 reason is that most languages don't support proper enums, and will fail when you don't handle all possible values

EVa5I7bHFq9mnYK · on March 1, 2023

Switch was invented because it allows to replace several ifs and gotos with a precomputed static jump table, an optimization trick.

radiator · on March 1, 2023

Switch is not bad, far less is there any reason to hate it.

smnplk · on March 1, 2023

Agree. It's the same as saying don't use for loops or any other basic language constructs. Switch is very useful, please leave switch alone! You will not take switch away from me :)

jbverschoor · on March 1, 2023

When do you actually pass through? Only some state machines do that. But even I. That case, ifs are more clear and less error prone due to missing breaks, braces/scoping issues, etc

_2uwr · on March 1, 2023

> It's easier to write, easier to read, and runs faster too!

ITs still possible to get a bottleneck in assembler.

Whatever language is used, executable code still needs to be profiled using tools as described here.

https://en.wikipedia.org/wiki/Profiling_(computer_programmin...

saagarjha · on Feb 28, 2023

The problem is that Casey has a very particular definition of simple which is problematic to apply in many cases.

whstl · on March 1, 2023

Does he? If anything, it is the definition of "Clean Code" that is somewhat special compared to previous usage of OOP and other paradigms.

Casey's definition of simple actually reminds me of the cliché by Rich Hickley. It's simple, but it's not necessarily easy.

saagarjha · on March 1, 2023

I don’t recommend adopting “Clean Code” either for similar reasons.

whstl · on March 1, 2023

Fair enough!

DalekBaldwin · on March 1, 2023

> Is Casey assuming the entire commercial software industry === Uncle Bob?

It's uncharitable to take Casey as making absolute blanket statements like that, but still, it would not be unreasonable for him to single out Uncle Bob in particular.

The Amazon rankings for Bob Martin's "Clean Code":

Best Sellers Rank: #5,338 in Books (See Top 100 in Books)

#1 in Software Design & Engineering

#2 in Software Testing

#4 in Software Development (Books)

Agentlien · on March 1, 2023

This comment helped make sense of this whole comment section for me.

I work in game development, largely with optimisation. I mostly work with GPU optimisation, which is a whole different beast. On the CPU, most of the time issues are either trying to do too much stuff in a hot loop (rendering stuff that could have been culled, putting physics on objects that don't need it,...) or doing something in a slightly inefficient way in a hot loop. Because everything in the game is indeed a loop consisting of a series of hot loops.

People in this comment section call his example contrived, but it's very similar to one of the biggest performance improvements I've seen in practice.

LorenPechtel · on March 1, 2023

Hot loops are where you spend your optimizing efforts. If you're going through that list of shapes again and again it very well might be worthwhile to cache some data and provide the objects with a way to update the cache.

Agentlien · on March 2, 2023

There are a lot of clever techniques already in play to minimise the amount of data you need to consider.

Still, each triangle's position, shape, and other properties can change each frame, as does that of the camera. So you cannot avoid doing some amount of work for each of the visible triangles and their vertices each frame.

Since you need to update the screen at a consistent frequency (typically 30 or 60 times a second) and the list of triangles that actually need to be rendered each frame is in the millions... Well, that's a lot of work which cannot be avoided.

bccdee · on March 1, 2023

Bob Martin has made a real effort to tie himself to the specific phrase "clean code." If the author of this article had referred to clean code without using quotation marks, or talked about managing software complexity using any other term, you'd be right, but I think he's specifically talking about Bob Martin's "Clean Code" and the school of object-oriented philosophy that cleaves close to his beliefs.

noobermin · on March 1, 2023

The fact that you think that "everything is inside a tight loop" doesn't apply to all code today already shows your own model of code is broken because you believe the syntactic sugar modern programming languages and paradigms provide is actually reality. If everything wasn't in a loop, your program would halt once you're done with whatever you're calculating. Just because you have things like callbacks and things feel lazy doesn't mean that things do not really operate in a loop on a deep level, of course they do. You just are insulated from it because you write hooks only and such and you don't actually see the loop.

Believe it or not, callbacks are not like interrupts, there is a loop somewhere that checks the status of something and then runs the callback. All computer software today involve things that run in loops, you just don't see it. Web browsers do it! Of course they do.

Moreover, he didn't contrive his example, he said that he in fact used a textbook example used by the advocates for polymorphism and such.

rileymat2 · on March 1, 2023

You dropped the word "tight", it is important.

noobermin · on March 1, 2023

How so? What loop isn't "tight"?

EDIT: To expand on this, why is modern software slow? The reason is because people, thinking their code isn't a bottle neck or performance doesn't matter but adherence to the right abstractions is, they write slow code and call backs thinking everything just happens immediately, with layers of abstractions, and those little innocent steps add up when every piece of code written today is written with that same neglect. The call backs runs slowly, queues fill, promises hang, and so-on and on we go.

So sure, may be your code doesn't seem like it needs to run at 60fps. But when everything I do on a computer is written like it will be run every 10 seconds, it definitely will be noticeable. That's because I don't just run your program or look at your website, I look at 10 of them or even more. If I average all of those per time, then may be your code should in fact be able to run at 1Hz or so or I will start to notice.

People of course were right not to teach new devs not to over-optimize immediately, but the culture has swung so far in the other direction, especially since you all seem to love complexity so much, you've managed to yes make computers that can calculate pi faster than a super computer from the 70s crawl when it renders and handles an editable textbox. There has to be a move back in the other direction, you guys need to give a shit about performance, at least a little.

piaste · on March 1, 2023

The kind that runs once and then waits geological ages to hit the next iteration. For example:

    while (command != 'quit') {
       command = readline()
       handle(command)
    }

hdhrufjdi · on March 1, 2023

Handle(command) runs in another loop at the same time, the scheduler.

drewm1980 · on March 1, 2023

New developers are still watching Uncle Bob videos online and taking those strategies as the default for how you craft code, largely because there are not very many people since then making similarly grandiose claims about how software should be crafted and forming entire companies pushing adoption of those techniques commercially. We even had a young dev leave our company and form a startup around the idea of doing what we do, but a full "clean code" rewrite. Our software already has major performance issues, I'm not hopeful about the speed of his code after he layers on even more abstractions.

zamalek · on March 1, 2023

Exactly. The problems of software performance come from decades of poorly/quickly executed evolutionary change resulting in bad systems design. It's all an new abstraction over an older abstraction over an even older abstraction, because some old application still needs to be supported (something Casey has likely never had the problem of worrying about in game development).

Game developers have the luxury of starting from near-scratch every once in a while. That exactly what his lauded handmade series is all about. I'm guessing that things wouldn't be so clear-cut if he was given a 10 year old codebase to iterate on.

"Normal" developers see game developers as gods walking amongst us and place far more value on their opinions than they should. The truth is that game developers and "normal" developers face equally as challenging problems, just different problems. As a trivial example, an experienced web developer could probably run circles around Casey in terms of elegantly accounting for browser quirks (conversely, the web developer would probably be stumped about data oriented design). Either could learn the other's discipline, but each would have decades head-start on the other.

The idolization of gamedevs is extremely frustrating, especially when it comes to appeals to their authority.

ehaliewicz2 · on March 14, 2023

Even more annoying than the idolization of game devs is when you open up the "Displays" submenu of the osx system preferences application, and it takes several times longer to load than the previous major os version, several seconds!, with the only significant change being a different layout, and constantly trying to ignore how nearly everything takes so much longer than necessary, wasting so much time and energy.

I agree that not everything is like a game, but it makes me legitimately sad when it seems like nobody cares about performance (aside from a few domains).

DeathArrow · on March 1, 2023

I am a developer who worked on embedded, desktop apps, mobile apps, games and now on large microservice based application. A developer is a developer and can move from one kind of application to another.

I find what Casey says in his videos to be true. And I though about that stuff before I even watched his videos, which are excellent.

However, I started not to care. I don't want to start fights inside the company, especially fighting alone against many OOP cultists. There's not my money at stake, so if companies as a whole decide for OOP, clean code, SOLID, design patterns, abstractions on top of abstractions, making the code bases giant pile of junks while degrading performance, I am not going to go against the crowd.

Code that I write for myself is quite different than code I write for my employers.

I just hope that the industry as a whole will wake up from the whole OOP nightmare.

zamalek · on March 2, 2023

> I just hope that the industry as a whole will wake up from the whole OOP nightmare.

I agree with that 100%. OOP is a giant mess, no matter where your stance on code clarity or performance stands. It's objectively worse in both regards.

_aavaa_ · on March 1, 2023

> something Casey has likely never had the problem of worrying about in game development.

This is simply not true, and has in all likelihood worked on such problems given his work at RAD whose software has been used in +20 years at this point.

> The problems of software performance come from decades of poorly/quickly executed evolutionary change resulting in bad systems design.

This may be true of some code bases, but it's demonstrably false for new software that's created today. Lots of new software gets built and it's slow.

intelVISA · on March 1, 2023

> The idolization of gamedevs is extremely frustrating

Every story needs a Hero; it's inspiring when you're trapped in the CRUD gulag (until you see the TC/WLB).

zamalek · on March 1, 2023

Fair enough, but never forget that you can be your own hero. I have heard numerous accounts of hobby coding being used as a successful antidote to chore coding.

magicalhippo · on Feb 28, 2023

Then again, thing's aren't in the critical path until suddenly they are.

Regardless of scenario I will never willingly do a O(n^2) sort when writing new code. Just in case those 10 items suddenly turn to 10000 one day.

cturner · on March 1, 2023

"Regardless of scenario" limit your options.

If you are shipping a binary to your users that will never be able to get updates, your cautiousness would be justified. There are other situations where it will needlessly limits options.

There are situations where I have knowingly written O(n^2) or worse, and put a # xxx dragons marker by it. Quick to write, leave my options open, keep my momentum on the problem I care about.

I will grep for xxx issues at some later time. I may end up throwing out the code before that happens. If I hit big-o issues before then, I can refactor.

I once had a system where the important problems turned out to be a series of IO bottle-necks - nothing to do with computation - but that was obscured because good sense had been burnt at the altar of compute efficiency.

joaonmatos · on Feb 28, 2023

Are you even manually implementing sorts frequently?

Even languages that are notorious for having tiny libraries, like C and JS, have built-in sorts.

mattgreenrocks · on March 1, 2023

It’s less about accidentally writing n^2 sorts and more about not accidentally creating n^2 algorithms.

whstl · on March 1, 2023

Not sorting, but very simple algorithms that can't afford having O(n^2) performance? That's common even in CRUD apps.

dagw · on March 1, 2023

Perhaps not sort so much, but certainly with search I've seen people roll their own inefficient search functions many times.

joaonmatos · on March 5, 2023

It is the exact same situation, though. Most people can just chain a sort and a binary search to do it, and both are included in most languages. Or just put it into a tree map, if your language has it.

bcrosby95 · on Feb 28, 2023

The moon doesn't fall into the ocean until it does.

TeMPOraL · on March 1, 2023

Not a fair comparison :P.

The point is that the developers may think O(n^2) is fine because their toy use cases had n=10...100, but then actual users will try to use the software for n=10k, or n=100k, and then either waste their lives working with suddenly slow software, or look for alternatives.

I walked into a case like this the other day. I wanted to do a little semi-collaborative project planning. I found a nice tool, played with it for a moment, figured it has the functionality I need and it's fast enough. Then decided to do the actual plan. Once the number of entries in the system went from 10-20 to 30-40, I started to feel things get a little laggy. 50-60, more laggy. At this point I was committed, so I suffered the tool for couple of months, as its UI kept breaking when handling 100 entries. If I knew this would happen at the start, I'd look for something else. But instead, I walked into a hidden O(n^2) somewhere, that makes me hate the product with a passion now.

chenglou · on March 1, 2023

It's more than that. The way black box composition is done in modern software, your n=100 code (say, a component) gets reused into a another thing somewhere above, and now you're being iterated through m=100 times. Oops, now n=10k

Generally, Casey seems to preach holistic thinking, finding the right mental model and just write the most straightforward code (which is harder than it looks; people get distracted in the gigantic state space of solutions all the time). However this requires 1. a small team of 2. good engineers. Folks argue that this isn't always feasible, which is true, but the point of these presentations is to spread the coding patterns & knowledge to train the next gen of engineers to be more aware of these issues and work toward said smaller team & better engineers direction, knowing that we might never reach it. Most modern patterns (and org structures) don't incentivize these 2 qualities.

Akronymus · on March 1, 2023

> The way black box composition is done in modern software, your n=100 code (say, a component) gets reused into a another thing somewhere above, and now you're being iterated through m=100 times. Oops, now n=10k

That doesn't seem quite right. as 100 * (100^2) <<<<< 10000^2

chenglou · on March 2, 2023

Yeah I was only talking about quantities. Equivalently, assume that it's a linear algorithm in the child and a linear one in the parent. Ultimately it ends up as O(nm) being some big number, but when people do runtime analysis in the real world, they don't tend to consider the composition of these blackboxes since there'd be too many combinations. (Composition of two polynomial runtimes would be even worse, yeah.)

Basically, performance doesn't compose well under current paradigms, and you can see Casey's methods as starting from the assumption of wanting to preserve performance (the cycles count is just an example, although it might not appeal to some crowds), and working backward toward a paradigm.

There was a good quote that programming should be more like physics than math.

celrod · on March 1, 2023

There was a fun example in the Julia compiler around a year ago.

Part of the compiler was O(N^2) in `let` block nesting depth. That is

  let x = foo(), y = y, z = 2y
    ...
  end

would be a depth of 3. It didn't seem like that should be a problem, N is never going to be 10, let alone 100, right?

Until suddenly, `N` was in the thousands in some critical generated code spit out by some modeling software, so that handling the scoping introduced by `let` suddenly dominated the compilation time...

gureddio · on March 1, 2023

The "contrived textbook example from 20 years ago" still has a very real impact today. In my experience there are still lots of development teams that are instructed to develop in a "Clean code" style in the flavour of Uncle Bob. It's especially true in the .net development space, and is almost a cultural problem within .net.

As a former .net developer that was often pushed into "clean code", my big takeaway from the video was that actually, not using "clean code" techniques, such as polymorphism made the code so much more readable and easier to grok that the optimisation that followed was completely natural.

sidlls · on March 1, 2023

There are a lot of people who advocate "clean code" principles without ever having read or knowing about Uncle Bob, because those "enterprise java dev like 10 years ago" folks sort of seeped into the industry.

It's the same thing with TDD zealots. Or any other fad driven development paradigm, which our industry is filled with.

ryukoposting · on March 1, 2023

This summarizes my impression of the article. It reads like a freshman CS TA giving a "well, akshually" speech. This is all old news- everyone knows vtables are "slow" in some very loose sense of the term. In general, these optimizations don't make a big enough difference to be worth considering while designing your code. In very particular domains, these ideas are valid, but a lot of that code is written in C so there's no dynamic dispatch anyway.

hsn915 · on Feb 28, 2023

All of these are instances of doing something _wasteful_, which is the #1 issue he mentions in the list of things that cause performance degradation.

Now, your argument seems to be: in the real world, there's so much waste, that virtual function calls pale in comparison.

This does not debunk his main point, which seems to me at least the following: all things being equal, writing code with virtual functions that do a tiny amount of work and "hiding implementation details" makes performance worse, sometimes by an order of magnitude.

Now, there maybe situations where you _have_ to use virtual functions, because you are writing a library for other people to use, and you can't dictate ahead of time how they will use it.

This again does not invalidate the point. You need to be _aware_ of the performance implications of this, and mitigate it. He said the following in the comment section on the article:

> Try to make it so that you do very rare virtual function calls, behind which you do a _large_ amount of work, rather than the "clean" code way of using lots of little function calls.

Zvez · on March 1, 2023

>all things being equal, writing code with virtual functions that do a tiny amount of work and "hiding implementation details" makes performance worse, sometimes by an order of magnitude

but all things are not equal. You can spend a lot of time improving performance of you function calls and get virtually nothing out of it. Because if you optimize something that takes 0.01% of overall execution time, 'order of magnitude' performance gain is still negligible.

Also articles like this usually fail to mention code maintenance cost. For example by reducing usage of virtual calls you can make your code unmaintainable/expandable and suddenly every new change will cost you 2x more in development time.

That's why in the real world most of the time you choose clean code and you use optimized nonclean code only on places where you need it. If you look at any lets say web framework internals, you will find a lot of non-clean code, which makes framework faster. But an interface will be done in clean fashion and most of user of the framework will enjoy clean code without need to care about unclean internals.

hsn915 · on March 1, 2023

This article is not aimed at people who are working in a codebase where everything is super terrible. It's written for performance aware programming. One of the aspects of performance awareness is awareness of how virtual functions and tiny functions affect performance negatively.

> Also articles like this usually fail to mention code maintenance cost. For example by reducing usage of virtual calls you can make your code unmaintainable/expandable and suddenly every new change will cost you 2x more in development time.

This never actually happens in real life. I've never seen a codebase that is written with "clean code" principles in mind that is also maintainable and easy to develop on top of.

citrin_ru · on March 1, 2023

Write slow code now, profile and optimize later is how we got all slow software because second step - optimization practically never happens in my experience.

Along with the heuristic that hardware and electricity are cheap, developers are expensive. That's probably why managers in my experience almost never ask developers to optimize slow code - they believe it would be cheaper to use more hardware if this fixes the problem (in some areas like HFT or GamDev more hardware is not the answer so optimization do happen). In rare cases I've seen optimization being done initiative always come from IC (who knew that the code could work fast / use less resources).

So nowadays writing code I assume that it never will be optimized later and try to do less dump stuff from the beginning.

radicalbyte · on March 1, 2023

The heuristic is one of those for the mid-experience developer. That's the point where you spend two weeks re-implementing Set<T> to save 1 cycle. Or over abstract. Lots of developers never get past that point.

Experienced devs - such as yourself - generally know how to write reasonably fast code that is also clean (or easy to extend) first time. In my opinion we should be more explicit about this in the heuristics.

Zvez · on March 1, 2023

>Write slow code now, profile and optimize later is how we got all slow software because second step

We got slow software because we were ok to get slow software. If being fast is not in requirements it means it doesn't matter (for whoever is responsible for defining priorities). Places where performance matters it is never sacrificed.

_aavaa_ · on March 1, 2023

Doesn't matter for what/who? It's doesn't matter from the business angle since there isn't any competition that works better. But it definitely matters for me.

I am not okay with slow software, and being forced to use it is frankly insulting. Teams (to pick a punching bag) hogging my system resources doesn't impact its adoption since I'm forced to use it. Teams could be 10x slower and I'd probably still be forced to use it. But it wouldn't be because speed doesn't matter.

curtainsforus · on March 1, 2023

This is a fully general counterargument for why anything that is bad is actually good because it doesn't matter. "Places where air quality matters it is never sacrificed".

moonchrome · on March 1, 2023

If it doesn't happen then it doesn't matter.

citrin_ru · on March 1, 2023

It doesn't matter in a sense that most companies continue to be profitable even paying significantly more for hardware / clouds than they potentially can. But it is sad to see nevertheless. Also it increases CO² emissions.

fvdessen · on Feb 28, 2023

If you program using design patterns that are 10x slower, your application end up 10x slower, even after you've optimised the hot spots away, and the profiler will not give you any idea that it could be still 10x faster.

badsectoracula · on Feb 28, 2023

If your profiler doesn't show any hotspots (which is incredibly rare in practice) and your program is still slow, it means you can simply pick any of whatever functions/methods show up near the top to optimize.

If your program isn't slow then you don't need to bother making it any faster. Even Michael Abrash, who specializes in code optimization, once explicitly wrote in his graphics programming black book that "The objective (not always attained) in creating high-performance software is to make the software able to carry out its appointed tasks so rapidly that it responds instantaneously, as far as the user is concerned. In other words, high-performance code should ideally run so fast that any further improvement in the code would be pointless [..] Notice that the above definition most emphatically does not say anything about making the software as fast as possible".[0]

[0] https://www.jagregory.com/abrash-black-book/#understanding-h...

fvdessen · on March 1, 2023

Think of using slow patterns as using a slow programming language. After you've optimised your python program and eliminated all hotspots, the python profiler isn't going to say 'hey, this could be still 10x faster if rewritten in go'. Note that if you write a graphics engine, you aren't going to use python or even go, but something more like c, c++, rust, even though your code would be cleaner in python. Clean code techniques elevate your level of abstraction and prevent some problems, but at the (significant) cost of performance. It's of course always a matter of trade-offs and this matters more or less depending on which problem you are trying to solve.

badsectoracula · on March 1, 2023

> After you've optimised your python program and eliminated all hotspots, the python profiler isn't going to say 'hey, this could be still 10x faster if rewritten in go'.

No it wont say something like that but if you profile Python itself and see that a lot of time is spent in Python you can get a hint that rewriting it in another language that doesn't have Python's overhead might help.

DeathArrow · on March 1, 2023

>Clean code techniques elevate your level of abstraction and prevent some problems

What problems do they prevent?

Applejinx · on March 1, 2023

Unemployment :)

If you're able to make a program into a sort of puzzle concealing its state inside a twisty maze of tiny virtual functions so it's impossible to see plainly how anything is done, then you become indispensable as the only one who's internalized how the thing works.

And then you insist it's all for easier comprehension… for those sufficiently intelligent to comprehend it.

trashtester · on March 1, 2023

Even more new clothes for the emperor.

fvdessen · on March 1, 2023

If the average programmer didn’t use ‘clean code’ techniques they would end up with crazy spaghetti, not the fast and clear code shown in the video. (the average programmer is not on HN and can’t do fizzbuzz)

jbverschoor · on Feb 28, 2023

Sure.. but the overall performance is in the cpu/vm, kernel, OS, UI, vm, electron, node, browser, javascript vm. etc etc

There are no hotspots. Computing has become lukewarm by default, and just like the globe, it's slowly heating up.

DeathArrow · on March 1, 2023

Casey does not advocate for code optimization. Not writing code in a way that is known to degrade performance does not mean you are doing optimization.

saagarjha · on Feb 28, 2023

> If your profiler doesn't show any hotspots (which is incredibly rare in practice) and your program is still slow, it means you can simply pick any of whatever functions/methods show up near the top to optimize.

No, though s/any/many/ would make this true.

boredhedgehog · on March 1, 2023

> If your program isn't slow then you don't need to bother making it any faster.

But you can only judge that for the very limited set of hardware you personally have access to. You might have many users -- or potential users -- with slower CPUs.

badsectoracula · on March 1, 2023

This is where having a target hardware comes in. Aside from doing it for fun, there isn't much of a point to care for 15 year old PCs for example.

amelius · on March 1, 2023

No, if one day you decide to run it on a less capable CPU, it's nice if the optimization work has been done.

badsectoracula · on March 1, 2023

This is the same as trying to guess the future that creates "astronautical architecture". You set a target hardware you're willing to support (e.g. PCs released in the last 10-15 years) and anything less is out of scope of the project. You don't need to support 8086 PCs for example.

(ignoring projects made for fun of course)

gjulianm · on March 1, 2023

Well, for starters, Amdahl's law exists. If you use design patterns that are 10x slower on a code path that takes only 0.5% of the execution time your application ends up 0.55% slower. And the profiler never tells you how faster can a code path be. It just tells you where are you spending most of your time so you can put the effort where it matters.

trashtester · on March 1, 2023

This is precisely the kind of situation where it's imperative to consider performance before starting to build a new system.

When you hit Amdahl's law, it's because you (or someone else) has made decisions about the high level design/patterns to use in a system. To remove such bottlenecks, you may have to scrap the entire project and start over.

For the inner loops, it's perfectly fine to leave most of the optimization for later. But for overall design, it needs to be right from the start.

Time and time again, I see devs stuck in some paradigm (often front-end devs or low volumen RESTful microservices) that makes it almost impossible to handle non-trival data volumes or traffic, causing new products to fail.

gjulianm · on March 2, 2023

I mean, you always hit Amdahl's law, and the point is that most often the time limits are not that related to the architecture. Let's say you do these "10x slower patterns" in a backend application, in the part where the DB model gets translated to a response to give the client... Yeah, maybe that pattern is slower but most of the time of the response is spent on the network and the DB response.

I do agree that overall design needs to fit performance requirements but for the most part that has nothing to do with "clean code" patterns.

seadan83 · on March 1, 2023

It's rare that its the algorithm and not the design that is slowing you down. Though, sometimes it is the algorithm slowing you down, just usually it's the design.

As an example, I recently worked on a large system that was optimized to do a big data transformation in an efficient way. It turns out that data is transformed back to the original format later downstream. So much for the optimization...

All that is to say, often it is the case that "simple > fast"

Clean code at the time had a lot going forward it, and it was an improvement over a lot of JavaEE code that was written in absolutely procedural ways with less care for the developer reading the classes, functions, or individual statements compared to punching out near assembly-like code and moving on.

paxys · on March 1, 2023

All of the advice in that article isn't going to bring your server latency for an API call down from 1000ms to 30ms, but rather from 30ms to 25ms. So sure, if you absolutely must optimize that 30ms call after you have fixed everything else then go ahead, but very few are at that stage or will ever get to that stage. And if you try to optimize that last 5ms at the expense of the much larger issues then you are actually making things worse.

dmitriid · on March 1, 2023

> All of the advice in that article isn't going to bring your server latency for an API call down from 1000ms to 30ms, but rather from 30ms to 25ms.

Of course it will.

If your backend service is already suboptimal, and running at 10x worse performance, optimizing that will give you, well, a 10x performance boost.

Imagine replacing poor in-memory reimplementation of database queries that most graphql servers do with actual opttimised database queries. And a better code on top.

Boom. You're operating close to the speed of light.

Zvez · on March 1, 2023

>Imagine replacing poor in-memory reimplementation of database queries that most graphql servers do with actual opttimised database queries. And a better code on top.

but you are actually talking about optimizing system design, and not reducing virtual functions calls :).

And that the point of this thread: you need to optimize parts that slow you the most.

So no, in most cases optimizing virtual calls won't bring you from 1s to 30ms

Zvez · on March 1, 2023

no take your tupical web service application. Even if you use design patterns that are 10x slower, your program will still be as fast as your DB and overall system architecture. And the choices on your DB schema, indexes and caches will have 100x more effect on your 99pp response time than design pattern you use.

whstl · on March 1, 2023

That's only true if the DB is equally as problematic, and you programming language/framework itself aren't compounding on the overhead caused by the "10x slower" architecture.

On slower languages with slower runtimes, something that is 10x slower than normal code will have much more overhead than in the examples demonstrated by Casey. It won't be about "30ms vs 25ms" as some people are saying. In the past I remember seeing differences between 400ms and 20ms between JBuilder and .to_json in a critical endpoint in a Rails app, to give one example. Sure, one is "cleaner", but in the end it's a 20x overhead that has no place this case.

Also, the myth that "processors spend time waiting for IO" that is spread across this thread is BS. In reality, that's only true for single-user programs. If the app is part of a distributed system, the CPU time can be used to serve more users. This allows you to significantly delay more complex scalability efforts, which is also precious developer (or DevOps) time.

Not to mention that applying "Clean Code" in the first place also takes precious time, which could be used for features or anything making money, even optimizing the DB. Instead, this time is used to mess up the code in ways that have zero proven efficacy, and some developers instead think are terrible.

trashtester · on March 1, 2023

I would argue that persistance and caching strategies are also design patterns. Of course, if you're not the tech lead/architect it may be out of your hands.

If you're working with databases, performance often boils down to minimizing the number of times you have to access the database (over a multi-service stack, not just a single web service).

WesolyKubeczek · on March 1, 2023

If you can remove thinly spread overhead from your web framework or any wasteful ritual dance you make for each request, it might not be much, but it does add up over the course of 24 hours * number of workers to quite a bit.

rileyphone · on Feb 28, 2023

Not the case if your program is spending most of its time waiting, which is typical these days.

whstl · on March 1, 2023

Not necessarily in the general case. If the program is a single user, locally ran program, then sure. If it's some sort of backend or distributed service, this is just wasted performance that can be used to serve more users. Virtually every distributed service built today is able to take advantage of this.

With a non-pessimal design, not only you are able to pay less money on servers in the long term, you're also able to delay complex scaling strategies. Scaling also costs money.

Not to mention that building something with this kind of overhead also means that a lot developer time was spent in the first place, which is still expensive in our industry.

Zvez · on March 1, 2023

using 'unclean' code practice will increase development costs. And more importantly - maintainability of such code.

>Virtually every distributed service built today is able to take advantage of this

most of built today services can take much more advantage in using better system design practices.

slaymaker1907 · on Feb 28, 2023

ORMs and slow DB queries kind of go hand in hand. Also, you'd be surprised at how efficient arrays are for membership checks so long as the number of items is even moderately small (as a rough rule of thumb, the only thing that really matters with these kinds of checks is how many cache lines are involved).

jimbob45 · on Feb 28, 2023

Well said. The aphorism "Premature optimization is the root of all evil" is meant to mean "Build it right first, then optimize only what needs to be optimized". There's really no need to start cooking spaghetti right off the bat. Clean code with some performance tweaks will be more maintainable in the long run without sacrificing performance.

TeMPOraL · on Feb 28, 2023

Casey's implied point is that clean code is already sacrificing performance from the start. And of course real life tells us that those "performance tweaks" will never happen.

There is this popular wisdom that security must be designed for from the start, and cannot be just added after the fact. Performance is like that too, except worse, because you actually can add security after the fact - worst-case, you treat the entire system as untrustworthy and wrap a layer of security around it. You can't do that with performance - there is no way to sandbox your app so it goes faster. You can only profile things then rip out the slow parts and replace them with fast ones - how easy that is depends on the architecture and approach you adopt early in the project.

eyelidlessness · on March 1, 2023

> And of course real life tells us that those "performance tweaks" will never happen.

I seldom wish I could post images on here, but I really would love to share a photo of the surprise coffee mug my work sent me, with a screen cap of a completely obliterated Y axis on a performance monitoring graph. Granted I don’t get to spend all my time hunting optimizations, but some teams/orgs/companies do very much value performance very explicitly.

Edit: and I’m definitely not a game dev. Though I’ve been itching to borrow some game dev techniques that are quite applicable for my domain (particularly ECS, entity component systems, which I suspect have far broader applicability than their adoption outside of game dev).

Turskarama · on Feb 28, 2023

This is not my experience at all, at my org we always write it the simple way first, and if it needs more performance after the fact then we always add the performance. User friendliness always comes first. This includes not waiting 20 seconds for a db query that could be rewritten to happen in 1 second, but it also means not waiting a month for a bugfix which could happen in a day with maintainable code.

TeMPOraL · on March 1, 2023

That's a nice approach, and I envy you the environment you work in.

In my experience so far, it's typically the case that the 20 second DB query will annoy people for months or years - it won't get solved until enough people raise enough of a stink that someone finally prioritizes it. A large customer suddenly starting to make vague hints about bad performance is sometimes (but not always) helpful.

Some may say that "annoying, but not enough to make a stink about it" means it's fine to not optimize it. But I found that people can suffer a lot, and it doesn't mean it's harmless. People will adjust their workflows to minimize the frustration. When some of your "victims" are in-house users, the "not important enough" performance issue may be silently but continuously losing company money.

eyelidlessness · on March 1, 2023

> In my experience so far, it's typically the case that the 20 second DB query will annoy people for months or years - it won't get solved until enough people raise enough of a stink that someone finally prioritizes it.

Among the many things I’ve learned as a self taught dev: you can be the person who raises enough of a stink if you care a lot. It’s not a thing you want to invoke frequently, but it’s a thing you very probably have power to invoke where it matters most. If you can make a good business case (or any case for user success that impairs your org), you have very good odds of being able to pursue it in any but the most toxic situations. If you can link $thing-you-want-to-pursue to other probably shinier biz/org goals, you’re 99% of the way there.

loup-vaillant · on Feb 28, 2023

> there is no way to sandbox your app so it goes faster.

In a number of cases you actually can. Casey demonstrated it in his Refterm lectures, it's caching. You still call the slow thing, but at least you don't call it as often because you have that layer of caching to partially insulate you from its poor performance. Good luck if you have to deal with cache invalidation, though.

TeMPOraL · on March 1, 2023

Fair enough.

I'll concede on saying that performance and security are alike - you can add some of either after the fact, but you're better off thinking about both from the start.

> Good luck if you have to deal with cache invalidation, though.

Ain't it the truth. Adding a cache is easy. Understanding the implications of doing it is harder.

peterashford · on March 1, 2023

This isn't true in my experience. Even in the context of games development. It pays to be simple at the outset because you often need to iterate code to get it right and writing and iterating optimised code is harder and slower than just doing something simple first. And YES, we ALWAYS went back and optimised the slow bits.

whstl · on March 1, 2023

But "being simple from the outset" is exactly what Casey is advocating here. Start with the simple code, that he ends up with, rather than optimizing for a "Cleanliness" metric.

His final code is definitely simpler than the alternative, which would probably involve several files in another environment.

Sure, he does reach for a benchmark, but that's merely to demonstrate the end result.

Aeolun · on March 1, 2023

> Casey's implied point is that clean code is already sacrificing performance from the start.

Where our opinion seems to diverge is where I accept that as being fine, for the sake of being clean/legible/understandable.

curtainsforus · on March 10, 2023

The argument continues to say that the 'clean code' version isn't actually cleaner/more legible/more understandable. I, for one, am royally sick of spending 5 minutes trying to figure out which virtual function implementation was actually called!

saagarjha · on Feb 28, 2023

Adding performance to software after the fact is typically far easier than securing it. Layers of security have to be watertight, and generally they are not if you don't design them from the start to be. A good performance engineer works in the constraints of their project to balance minimal code changes with performance wins.

noobermin · on March 1, 2023

To be fair to OP, does anything about your current paradigm even allow you to evaluate his claims? Is your code even in a form that you can for example remove polymorphism and use simple arrays of data you want to work over?

You can of course only optimize what you are looking to optimize. I am not surprised (honestly) that some engineers do not realize they are in fact primed for the kind of things they will find based on what they are looking by the mere choice of where they want to look.

DeathArrow · on March 1, 2023

Maybe for low user count is valid. We run a large multitenant, microservice based application and the physical machines where the Kubernetes pods reside have their CPUs at 90%. The application makes such a large use of "clean coding", "design patterns", SOLID that would make Uncle Bob proud. We would have been better without using so much abstractions on top of abstractions.

flumpcakes · on March 1, 2023

This is my experience. As I have said elsewhere, perhaps I am unlucky and work with "bad" developers, but everything being built for the future at $COMPANY is over complicated and has a multitude of abstractions.

Honestly, for most software, a well performant "monolith" is probably enough.

We had one service that had a single-thread bottleneck that none of the developers could configure out - the solution was to spin up 30x more virtual machines to run instances of this app to meet average production demand.

nijave · on March 1, 2023

In a business context, it usually happens that hardware is cheaper than software (licensing) which is cheaper than engineering labor. Tack on the opportunity cost of delaying business advances/features and it's usually cheaper to just throw hardware at it.

There's a tipping point where you have so much hardware there's big savings with optimization. Things like Postgres and the Linux kernel have a lot of optimization put into them and there's an insane amount of hardware out running that code.

That said, slow software sucks.

munk-a · on March 1, 2023

I'd actually say this article is generally unhelpful - it's good to be aware but as someone who works on sorting out performance critical things I want the code to be as clean as humanly possible going in. Whether you write clean or dirty code if you're a junior developer you're probably not going to write performant code and even senior devs may be able to sniff what might be a bottleneck in advance but most of us have learned to avoid premature optimization like the plague.

Maintainability and cleanliness is the best virtue code can have. If you have extremely clearly written code that has performance issues I can swoop in with analysis tools figure out where the pain point is and refactor it out. Sometimes this is a real headache[1] sometimes not - what I can guarantee is that if the code is "dirty" it's going to be a headache and it'll take more time.

I'd personally take issue with this article over the polymorphism claim though - polymorphism is a tool but it isn't the be-all and end-all tool. A lot of your data can live as structs/blobs in memory with tight internal type definition but without any OO principals. Personally I am a huge fan of functional programming (but not pure functional programming) so objects that I use are relatively few and far between and exist to fulfill a very specific purpose.

I've had two occasions in working when I needed to break out an asm block - the compiler was being a thick headed dummy and this code needed to receive incoming signals without exception or delay - but once that critical section was passed? Back to high level programming and statements favoring expressiveness over raw bare metal performance.

If you want an interesting experience talk to your closest non-technical manager type - be that a product team manager or the company owner - and ask them if they'd prefer if you focus on reducing how long your product takes to execute by 20% over the next five years or if they'd prefer you to lower the growth of the developer labor budget by 20% for the next five years by focusing on maintainability over performance. With the exception of extremely niche cases maintainability is always the golden standard.

1. For instance, I've dealt with OOM issues that have required transforming all logic on a query result to be lazily evaluated on a data stream after main execution finishes - like the logic goes up and down the stack and only then begins processing results. In this particular case the problem was rather easy to deal with because we essentially swapped out the actual value passing on each layer for a lazy result set being passed around - because the code was clean. Sometimes you'll definitely need to massively re-engineer things though.

whstl · on March 1, 2023

"Whether you write clean or dirty code"

I feel like there's a misunderstanding here. Casey is clearly not against writing non-capitalized clean code at all. His code in the end is "cleaner" than what he criticizes IMO. What he is criticizing here is capitalized (and possibly trademarked) "Clean Code", the book and philosophy spearheaded by Uncle Bob.

bigbacaloa · on March 1, 2023

Maintainability and cleanliness are not the best virtues code can have. Far more important are that it work correctly and quickly.

munk-a · on March 1, 2023

I agree that correctness is pretty essential (as in - actually does what it says, though something that's mostly correct is almost always the bar... most software doesn't need to be entirely correct). But I am confused about "quickly" do you mean dev time or execution time?

osigurdson · on March 1, 2023

>> Lack of concurrency/parallelism

Definitely get the single-threaded house in order before attempting to speed up by running in parallel.

yxhuvud · on March 1, 2023

Depends. For example, if the slowness comes from sequentially emitting a lot of http requests, a lot of performance can be gotten from doing it concurrently.

jbverschoor · on March 1, 2023

Well that's exactly the difference between systems programming and application programming.

Don't forget virtual machines and interpreters.

Dave3of5 · on March 1, 2023

> Slow DB queries

What I've seen is slow queries but a bigger problem is actually too many queries. It's easy to do especially when using an ORM.

It mostly happens on change when you want to add something to an existing query the changer just add their new query and slop it into a loop, boom performance is gone.

LorenPechtel · on March 1, 2023

One I found with profiling--when writing the code n was quite small. Many database operations simply iterated over an array to decide where to store an item. The time spent dealing with the data was a tiny fraction of the database round trip time, there simply was no reason to get fancy.

Over the years they've grown and one bin showed up where jobs would sit around for weeks--for that case n went from rarely having a screenful of lines to a few thousand--oops, now more than 2/3 of the time was spent in those searches. (And I have a sneaking suspicion that a good portion of the remaining time comes from using field names to retrieve values. The profiler doesn't separate that out, though, because it's not my code.)

gofreddygo · on March 2, 2023

Some more indirect reasons

- shit UX "ideas" that trigger 10 new API calls to show dialogs and popups.

- Logging libs under pressure

- overemphasis on testability

anitil · on Feb 28, 2023

How did you get in to that work? I love this sort of optimization but not sure how to get people to pay me to do it full time.

rkuska · on March 1, 2023

And let's not forget that majority of CPU cycles is spent on encoding/decoding json requests.

thermin · on March 1, 2023

Still unvectorized in Java

hallqv · on March 1, 2023

Great reply, thanks for sharing your experiences!

e28eta · on March 1, 2023

Author could use a little more memoization in his example, but I suspect that breaks some of the simplicity of his argument.

If shape Area is computed often enough that you care about inlining the calculation, why not compute & store it every time the height / width change. That’d be easy enough in an architecture based on information hiding, and might illustrate a legitimate engineering trade-off between those architectural choices.

cratermoon · on March 1, 2023

Right? His Area() function does the calculation from the scratch every call. Either a) make the Shape immutable and calculate area once, at create time, or have the mutator functions recompute the area when they are called. At that point Area() just returns an f32, and the compiler can do all kinds of optimizations.

incrudible · on March 1, 2023

You are completely changing the problem. Remember, this is an example, the code does the same thing in either implementation.

cratermoon · on March 2, 2023

> completely changing the problem

Indeed. That's the point. Why let someone with an axe to grind define the problem in a way they can solve with their axe? The author is using optimization as the frame for rejecting a particular way of working, I'm pointing out that the definition is set up to make the solution work. I don't concede the terms of the debate before it even begins.

cratermoon · on March 1, 2023

Speaking of mutators, his example depends the code executing in a single thread. What happens if calls to mutate the shape and calls to calculate the area are interleaved?

mquander · on Feb 28, 2023

> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.

That's really not even close to true. Loading random websites frequently costs multiple seconds worth of local processing time, and indeed, that's often because of the exact kind of overabstraction that this article criticizes (e.g. people use React and then design React component hierarchies that seem "conceptually clean" instead of ones that perform a rendering strategy that makes sense.)

simplotek · on Feb 28, 2023

> That's really not even close to true. Loading random websites frequently costs multiple seconds worth of (...)

You attempted to present an argument that's a textbook example of an hyperbolic fallacy.

There are worlds of difference between "this code does not sit in a hot path" and "let's spend multiple seconds of local processing time".

This blend of specious reasoning is the reason why the first rule of software optimization is "don't". Proponents of mindlessly going about the 1% edge cases fail to understand that the whole world is comprised of the 99% of cases where shaving off that millisecond buys you absolutely nothing, with a tradeoff of producing unmaintainable code.

The truth of the matter is that in 99% of the cases there is absolutely no good reason to run after these relatively large performance improvements if in the end the user notices absolutely nothing. Moreso if you're writing async code that stays far away from any sort of hot path.

mquander · on Feb 28, 2023

The subfields of programming I know the most about are game development, networking, and web development, and in all of those, it's not the case that only 1% of the code is in the "edge case" where performance matters at all.

For example, in the case of web development, if you build a medium-sized website with React (i.e. pretty normal behavior nowadays), then if you make default decisions that don't consider performance at all, your website will end up noticeably slow, because you will:

1. Write code that re-renders components all the time during loading and UI interactions,

2. Which depend on tons of third-party dependencies that perform poorly,

3. So you end up spending a ton of time in re-renders while the site loads and while someone is using it.

Dealing with this isn't literally the same performance work that Casey put in his article, because it's at a slightly higher level of abstraction, but it requires the same mindset. It requires writing most of your code (and taking on dependencies) with performance in mind, not just 1% of it. You can't avoid it without your notion of "clean code" including some amount of mechanical sympathy, rather than just being about abstract extensibility and generalization concerns.

whstl · on March 1, 2023

It's also very easy to mess up the performance due to architecture when using modern frameworks. For example: as much as SPAs are reviled, if you have a complex enough web application with heavy components, having it being an SPA might be better in terms of perceived speed than having it do a full reload on every navigation.

This is a mistake that I see far too many government and utility sites making.

Aeolun · on March 1, 2023

Except that in React writing clean code will actually make your application faster.

engineeringwoke · on March 1, 2023

Re-renders are incredibly cheap in the big picture. They are not a source of performance bottlenecks in 99% of real world applications

xtian · on March 1, 2023

So why do 99% of real world applications run like garbage? What’s the culprit?

simplotek · on March 1, 2023

> So why do 99% of real world applications run like garbage?

If you're really interested in the impact of performance issues on everyday life, you need to provide concrete examples instead of putting up unverifiable strawmen.

The truth of the matter is that 99% of real world applications run just fine, and it doesn't pay off to invest in shaving milliseconds here or there. Would it be desirable to have a magic wand to improve some edge cases? Yeah, why not? Is it worth to pay people to spend time with a stopwatch at hand to shave off these milliseconds? Not really. It's all about tradeoffs, and there is no real world payoff in wasting developers' time to shave off that millisecond here or there.

mtrower · on March 1, 2023

You know, I started on some concrete examples for you, but I had to stop and back up. Really? You can't think of any examples yourself? The modern web is absolute hell to use if you aren't on modern hardware. Try it sometime; use a 10 year old phone, or an old computer that wasn't built top-of-the-line.

There's so much hardware out there that can run native applications just fine, that can play back HD video, that can run complex 3D real time video games, but crawl like molasses when loading your average webpage. Facebook and YouTube are terrible offenders, but so are your average blogs. Many banking websites are terrible (yet they don't have to be; my local credit union has a zippy website that looks attractive to boot, has modern design elements, etc.).

Maybe the hardware you're running is eye-wateringly fast, or maybe it's just barely fast enough and you don't need the cycles for anything else. But we're not talking milliseconds. We're talking order(s) of magnitude. I can't bring myself to believe you don't see at least some of it, if you just open your eyes and look around.

albedoa · on March 1, 2023

> Really? You can't think of any examples yourself?

"Unverifiable!", he wrote from within a web browser.

sanitycheck · on March 1, 2023

The trouble with this attitude, is that it misses the fact that the _range_ of computing power of devices which ordinary people use is probably greater now than it's ever been.

Let's not even get into low end phones in developing countries or bargain basement android tablets, let's stick with something straightforward - an ordinary PC.

I took a quick look online, sorting by cheapest first I found something with an AMD 3015e in. Based on cpubenchmark.net that gets a benchmark of 2691. Taking a look at the big list of CPUs I see that's equivalent to a powerful desktop CPU from 2008, or a decent laptop from 2012. (The Apple M2 in the current Macbook Air gets a score of 15369, just for comparison.)

So, if you're writing PC software or making a fancy web app and you want everyone to have a good experience with it then you should see how it runs on a terrible new laptop, or a 12 year old good laptop, or a 15 year old powerful desktop.

(And yeah we all have SSDs now which is much better than in the old days, and JS is generally single thread, and single threaded CPU performance has not improved so much - but I think my point still stands.)

nottorp · on March 1, 2023

You basically waste millions of seconds of users' lifetime instead :)

Not to mention that web apps redrawing everything whenever they feel like it can almost give motion sickness, lead to clicks on the wrong things etc.

Yeah, but users' time is worthless.

whstl · on March 1, 2023

Since we're talking about React, Facebook's web app is incredibly slow to me, and typing text in some fields is slower than me. As in, it sometimes takes a second for the letters to start showing. And no, it's not a browser rendering issue (even if it were it would be terrible), as disabling JS in Safari brings back the performance.

Another app that I'm not sure if it's React or not is New Reddit. It is significantly slower on my computer and on a lot of people's computer and sometimes you have to refresh the page because it consumes too much memory.

I can come up with other local examples, from Germany. Vatenfall's website seems to "traditional" page navigation, but the content is loaded via a framework. Due to having to reinitialize everything, text takes up to 5 seconds to appear when using the back button or when navigating for page to page. Similar things happen in the German Agentür fur Arbeit.

cnity · on March 1, 2023

My hot take is that React doesn't directly _cause_ slow performance, but that React is so well marketed as a first place to start for new programmers that the bar for quality is much lower than other industries (like game programming, or even "vanilla JS" which is increasingly seen as an advanced approach).

wruza · on March 1, 2023

The promise of frameworks like React is that you write code in their way and they take care of performance, because functional, declarative and all that. You just use primitives and don’t control how it all works under the hood. Coping may be a good strategy here, but isn’t a good argument.

xtian · on March 1, 2023

> The truth of the matter is that 99% of real world applications run just fine, and it doesn't pay off to invest in shaving milliseconds here or there.

You're taking deep quaffs of the Kool-Aid and so are most of the people commenting on this story. General software responsiveness and reliability (i.e., usefulness) has been in decline for decades. This is an objective fact.

Writing objective reality off as mere "milliseconds", "edge-cases", or only relevant for "toy problems" exemplifies the arrogance and severe incompetence of most programmers. People are seriously trying to talk down to Casey Muratori when in all likelihood they haven't accomplished even 1% as much as him as programmers.

I get it—no one wants to leave fantasy-land as long as the easy money is flowing. But sooner or later the glittering carriage turns back into a pumpkin.

gitgud · on March 1, 2023

Go to a slow website on Chrome. Open dev tools and turn on "enable paint flashing", you'll see a seizure inducing light show of re-rendering...

It's certainly a huge problem for real world applications

loup-vaillant · on Feb 28, 2023

I realised when I implemented EdDSA for Monocypher that optimisations compound. When I got rid of a bottleneck, I noticed that another part of the code was the new bottleneck, and some of the optimisations compounded multiplicatively. It took many changes before I finally started to hit diminishing returns and stop. All while restricting myself to standard C99, and trying fairly hard not to spend too many lines of code on this.

My point being, if most of the program is slowed down by a slew of wasted CPU cycles (costly abstractions, slow interpreted language…), there's a good chance what should have been obvious bottlenecks get drowned in a see of underperformance. They're harder to spot, and fixing them doesn't change much.

So before you even get to actual optimisation, your program should be fast enough that actual optimisations have a real impact. And yes, actual optimisation should be done quite rarely. But first, we need to make sure our programs aren't as slow as molasses. See https://www.youtube.com/watch?v=pgoetgxecw8

simplotek · on March 1, 2023

> I realised when I implemented EdDSA for Monocypher that optimisations compound.

I feel you're missing the whole point.

It's immaterial whether anyone can get to optimizations that compound multiplicatively. The whole point is that halving something that costs nothing earns you nothing. That's the whole point. Go ahead and shave off that millisecond. Will anyone actually notice whether you add or remove that penalty? Odds are, not at all.

loup-vaillant · on March 1, 2023

People did notice. Quite a few happy users are glad signature verification took less than a second instead of more than 3. Or 30, if you compare to some of the alternatives. Others love the fact it uses 2KB of stack space instead of 5.

Monocypher's speed was actually an important component in its success in the embedded market, even though I didn't explicitly target it initially (I was lucky my portability driven decisions made it a good fit there).

tharkun__ · on March 1, 2023

Not your parent but I think that is exactly the point lots of people here are making.

There definitely are niches where there are quite a few performance optimization opportunities that users do care about.

In your example making something a user is actively waiting for go from 3 seconds to less than one is a great optimization target. What is not a great optimization target is making something the user is actively waiting for and that takes 30ms take 25ms instead. That's wasted money on developer time.

If your "user" is a developer of embedded software with memory constraints and using your library leaves them more room that's awesome. If your user was someone using the library on a general purpose computing device with loads of memory then the 2 vs. 5 does nothing.

rerdavies · on March 1, 2023

You need to do some research on how to get modern C/C++ compilers to vectorize.;-) No assembly required, and not that hard to restructure code. (But MUCH easier in C++).

loup-vaillant · on March 1, 2023

I tried auto-vectorisation, and it worked pretty well. But the code became just as big as using intrinsics would have (that with explicitly unrolling loops and rearranging things in memory), and intrinsics generated code that was easily 35% faster.

I decided not put it off for later, and keep things simple for now.