The "lean C program" alternative author presents is neither the correct equivalent of fscanf nor technically correct, and people here recognized this, my attempt to summarize:
- The proper conversion has enough corner cases that naive implementation practically certainly misses. Author's version produces provably different results than the library one for a lot of the same inputs.
- The fscanf must process the format string on every call, therefore we're not comparing the same functionality. There are library functions that don't need the format string to work.
- The standard library must be able correctly process Unicode and locales.
See the other comments for more details. Of course, it can be that the author doesn't need more than what he implemented, but he didn't discuss the tradeoffs or incorrect reading in the article and the other readers are advised to investigate more before using his code.
Another source of overhead is that fscanf() on POSIX systems (like most operations on FILE *) is required to lock the underlying FILE structure. Even without contention, the overhead is non-trivial.
This is partially because converting strings into floating point numbers accurately is a surprisingly subtle thing to get right; the following links describe a bug in a library function that has been around for over 2 decades:
If you have numeric data in e.g. NumPy arrays and you care about performance, numpy.savetxt() followed by text parsing in C++ will never match the speed of just storing the binary data. For example, use numpy.save() then load the NumPy format in C++ (it's a pretty simple format), or use a more cross-language format like HDF5.
Even if you have received data from an outside source in text format, perhaps the first thing you should do is convert it to a binary format on disk; this will save you a lot more overhead than any amount of scanf() optimization will.
Certainly it depends. One one side, if your input data is read only once or a few times, introducing any intermediate binary format will often not be worth it, and optimizing reading the ASCII is the way to go.
On the other hand, if you have to crunch through huge amounts of locally stored data, try to make most of your binary file-format resemble the in-memory layout as closely as possible. That way you can, with some luck, mmap() your file and just use pointers into the bulk data. Saves at least one level of copying, from the filesystem cache to application heap. Be careful with validation of lengths and offsets, though!
In my opinion that's bad C/C++ code: it's C/C++ code, two `while (!feof(fd))` loops, compiled without enabled diagnostics, and an inferior parser with at least one possible integer overflow (in ConsumeInteger()).
If it works without problems for your data, good for you, but for general purpose it isn't nearly a replacement as fscanf() or strtod() (hopefully) work correctly and don't have weird limitations like 300 / -300 as max / min base 10 exponent. Parsing floating point numbers is not easy.
Using feof() to somehow ask if a stream has reached EOF before doing any I/O is wrong. Code like this comes up very often on Stack Overflow so it seems it's somehow intuitive for people, or suggested by some tutorials or something.
The purpose of feof() is to answer, after I/O has failed, if it failed due to the stream reaching EOF.
It's also utterly pointless in this code, it should just check the return value of fscanf() and stop when it fails to convert the desired number of values. This, on the other hand, seems deeply surprising to most people, you very rarely see code that cares about the return value when scanning. Which is strange; I/O can fail.
Further, since fscanf() basically ignores embedded whitespace, I'm not at all sure it will do as expected and read six floats per physical line of input.
Lastly, yes of course it's strange to use the super-flexible fscanf() if you're looking for performance.
fscanf has to parse and execute the format string. It might be interesting to compare your custom ascii-to-float converter to one which calls strtof in a loop.
It would also be interesting to compare it to C++'s ">>" operators; one of the reasons to introduce them was exactly to avoid run-time parsing of format strings.
Try setting LANG=C before running the program, or doing setlocale(LC_ALL, "C"). That _might_ improve the speed. In any event, locale handling--such as determining whitespace--will slow things down considerably. I presume this is happening on Linux, and glibc's locale handling is dog slow. Solaris' is ridiculously fast by comparison. Why, I don't know.
I had an application where I was reading a 3GB text file into memory, directly from and SSD. Using fscanf the speed would top out at around 100-150MB/s. Only after I've switched to fread the data was actually read at the maximum speed of the drive - over 400MB/s. I guess it's due to parsing fscan is doing.
The C program isn't doing the same thing. It overwrites the array of variables each time it scans a line. numpy.loadtxt returns an array of the parsed lines. So you'd need to do some dynamic allocation to store the parsed values. If the number of fields per input line isn't hardcoded, you'll need to walk sscanf() across each input line while managing input and output buffers (you can call read(), which is much faster, but you'll need to handle partial line reads). numpy.loadtxt has additional capabilities, all of which add per-line overhead. A fully functional clone of loadtxt in C is unlikely to run enormously faster, i.e. enough to justify the programming involved. In any case, this toy exercise isn't testing the speed of fscanf() in a meaningful way.
I'm really not surprised. fscanf is a really general thing, it handles many cases and has to do 'perfect' text to floating point conversions.
As a rule of thumb the formatted I/O functions are quite slow, and floating point makes this much much worse.
It's one of those few places where rolling your own functions for your particular use case will be much more efficient than using the one size fits all standard library solutions.
[sf]scanf is also significantly faster than the C++ iostreams equivalents. I sped up a IO bound program stream processing program recently by 40+% by converting it back to sscanf().
- The proper conversion has enough corner cases that naive implementation practically certainly misses. Author's version produces provably different results than the library one for a lot of the same inputs.
- The fscanf must process the format string on every call, therefore we're not comparing the same functionality. There are library functions that don't need the format string to work.
- The standard library must be able correctly process Unicode and locales.
See the other comments for more details. Of course, it can be that the author doesn't need more than what he implemented, but he didn't discuss the tradeoffs or incorrect reading in the article and the other readers are advised to investigate more before using his code.