Geekbench is an excellent benchmark, and has a pretty good correlation with the ...

BoingBoomTschak · on Nov 1, 2024

Any proprietary benchmark that's compiled with the mystery meat equivalent of compiler/flags isn't "excellent" in any way.

SPECint compiled with either the vendor compiler (ICC, AOCC) or the latest gcc/clang would be a good neutral standard, though I'd also want to compare SIMD units more closely with x265 and Highway based stuff (vips, libjxl).

And how do you handle the fact that you can't really (yet) use the same OS for both platforms? Scheduler and power management counts, even for dumb number crunching.

janwas · on Nov 2, 2024

Good points. gemma.cpp can also be an interesting benchmark, it also uses Highway.

llm_nerd · on Nov 2, 2024

Geekbench is a highly regarded benchmark because it effectively reflects the overall performance of various platforms as experienced by the average user. By "platform," we mean the combination of hardware and software—how systems are actually used in day-to-day scenarios.

Specint, on the other hand, is useful for assessing specific tasks if you plan to run identical workloads. However, its individual test results vary widely. For example, Apple Silicon chips generally perform well in Specint but might match a competing chip in one test and be three times faster in another. These tests focus on very narrow tasks that can highlight the unique strengths of certain instructions or system features but are not representative of overall real-world performance.

The debate over benchmarks is endless and, frankly, exhausting, as it often rehashes the same arguments. In practice, most people accept that Geekbench is a reliable indicator of performance, and I maintain it’s an excellent benchmark. You might disagree, but my stance stands.

BoingBoomTschak · on Nov 2, 2024

Lots of appeal to popularity, "most people accept" a lot of things.

>Specint, on the other hand, is useful for assessing specific tasks if you plan to run identical workloads. [...] These tests focus on very narrow tasks that can highlight the unique strengths of certain instructions or system features but are not representative of overall real-world performance.

What? First, SPECint is an aggregate of 12 benchmarks (https://en.wikipedia.org/wiki/SPECint#Benchmarks), none of them synthetic in any way. They're also ranging from low to high level, it's not just number crunching. Sure, it's missing stuff like browser benchmarks to better represent the average user, but it's certainly not as useless as what you seem to imply.

Any "system wide" benchmark is aggregating too much into a single number to mean anything, in any case.

And this subthread is about using benchmarks to compare HARDWARE, not whole systems, so this discussion is pretty much meaningless.

llm_nerd · on Nov 2, 2024

I never said SPECInt was synthetic though did I? What are you arguing against?

Yet a benchmark of how Xalan-C++ transforms XML documents has shockingly little relevance to most of the things I do. And the M1 runs the 400.perlbench benchmark slower than the 5950X, yet it runs the 456.hmmer benchmark twice as quickly, both I guess mattering if I'm running those specific programs?

As with the strawman that I said it was synthetic, I also didn't say it was useless. Not sure why you're making things. It's an interesting benchmark, but most people (yup, there's that appeal again) find Geekbench more informative.

And, again, most people, including the vast majority of experts in this field, respect geekbench as a decent broad-spectrum benchmark. As with all things there are always contrarians.

>And this subthread is about using benchmarks to compare HARDWARE, not whole system

Bizarre. This submission is specifically about Geekbench, specifically about the M4 running, of course, macOS. This subthread is someone noting that they can't escape the negatron contrarians who always pipe up with the No True Benchmark noise.