Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes... I know it's not a myth. :-) I was responding to someone who was saying that it was a myth.

> It should be faster in nearly all circumstances

As my previous comment showed, that's definitely not true. If you're searching a whole bunch of small files in a short period of time, then it appears that the overhead of memory mapping leads to a significant performance regression when compared to standard `read` calls.

> it’s really the easy case

I know. :-) That's why ripgrep has both modes. It chooses between them based on the predicted workload. It uses memory maps automatically if it's searching a file or two, but otherwise falls back to standard read calls.

Moreover, if ripgrep aborts once in a while because of a SIGBUS, then it's usually not a big deal. It's fairly rare for it to happen. And if it does happen to you a lot or you never want it to happen, then you just need to `alias rg="rg --no-mmap"`.



I love ripgrep, btw, great work.

I was pondering this some more in the shower, the mmap for rg case is also sort of naturally cache oblivious, copies will consume hardware cache for the write and while there is a ton of hardware for cache on modern hardware, it’s a noticeable cost on some tests. If you’re searching through something big, then it’d be like doubling hardware cache which is probably really noticeable on smaller devices.

The small files case is interesting, copying the data is faster than patching up the page table tree, I bet there is a strong correlation to the hardware cache size vs the average file size in that case. The files probably need to be N pages in size for it to be worth it, might be an interesting heuristic to use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: