Gonum – Numerical Computing for Go

openasocket · on Sept 21, 2017

How does this play with Go's scheduler? My understanding is that the Go scheduler is not preemptive, and goroutines are switched out at yield point, like the start of a function body. So tight loops that don't call other functions can effectively hog the OS thread until it leaves that loop body (No idea what happens when doing FFI, maybe that's done in a separate thread pool?). For most cases where you would use Go you aren't generally doing a bunch of CPU-bound work so that doesn't matter, but here you might run into some hiccups. I'm specifically thinking of a case where you use this library to do some heavy matrix operations as part of a web service, and those tight loops hog the OS threads and hurt your bandwidth and p90 latency.

My question to the developer: is that issue something you've encountered with this library? If not, did you design the library to periodically yield in tight loops, or am I just completely wrong about the Go scheduler?

anonacct37 · on Sept 21, 2017

You're right in that people have seen high p90 latency as a result of things like base64 encoding large blocks.

But one thing to remember is that go inserts gc and pre-emption points at function call sites. So basically as long as a function is occasionally called you're good.

Cgo threading does complicate the matter. My understanding is that cgo calls are done in a threadpool with a larger stack size. I don't know the details about how that threadpool is managed. Not sure if this would help or hurt your concern.

Also, don't forget GOMAXPROCS. There's nothing stopping you from letting the go runtime spin up arbitrarily large number of OS threads.

So it's not an ideal situation, but if you're careful I don't think tight loops are likely to torpedo an otherwise sound go project.

howeman · on Sept 21, 2017

I don't use Gonum with a webserver + large calculations so I can't definitively answer. No one has reported problems, but that could be a lack of usage. One thing though is that matrix multiplication (which is a kernel for higher-level operations) is written in a blocked format, and the code can be pre-empted on any of those blocks, so I wouldn't suspect it's a problem.

openasocket · on Sept 21, 2017

Yeah, skimming your source it seems most of your loops involve calling some function, and even if that's inlined I believe the Go compiler will put a speculative yield call in there.

I suppose my hypothetical would be an issue if you used a non-Go BLAS implementation, as calling out to C will hog the OS thread. But this is a known issue (e.x. https://www.cockroachlabs.com/blog/the-cost-and-complexity-o...).

chewxy · on Sept 21, 2017

The solution to that is to write a C-batcher. Gorgonia uses that (optional) - https://github.com/chewxy/gorgonia/tree/master/blase

(also it's undergoing major reconstruction/refactoring right now)

sbinet · on Sept 22, 2017

also, there's been work to make for-loops preemptible:

- https://github.com/golang/go/issues/10958

- https://go-review.googlesource.com/c/go/+/33910

- https://go-review.googlesource.com/c/go/+/36206

- https://go-review.googlesource.com/c/go/+/46410

- https://go-review.googlesource.com/c/go/+/43050

Thaxll · on Sept 22, 2017

I'm not sure to understand that p90 latency problem, the cpu is used somewhere anyway so even if you use another language you won't be able to server a request while doing some intense cpu work?

infogulch · on Sept 22, 2017

The cpu will pause it to give all threads some cpu time. The difference is that it's the OS doing the work of cleaning up between threads, as opposed to the go runtime pausing and switching. Keeping it all in Go is faster, but it doesn't have the capability to pause, cleanup, and prepare for re-execution in the middle of a block of code that the OS does.

paultopia · on Sept 21, 2017

A solid, featureful & performant numerics library seems like a really good match for Go---if it can match numpy but also provide benefits like the safety of types, binary compilation, and better performance in non-numeric code, that's a really exciting case for sliding away from python?

microtonal · on Sept 21, 2017

I have done some numeric programming in Go and compared to Python it's really hampered by the lack of operator overloading.

Of course, it just provides convenience, but it's what makes writing stuff in numpy, Tensorflow, Eigen, etc elegant.

paulsutter · on Sept 21, 2017

As of reading your comment, I'm 100% convinced that Go needs generics. I'm a longtime Go advocate, love coding in Go, but until now thought the lack of generics is just fine.

Lately I do a lot of numpy/tensorflow, and have begun to really dislike the slowness of python. It would be great to do that work in Go specifically.

jerf · on Sept 22, 2017

If Go was to become big in the scientific programming community, it would need generics eventually.

One interesting thing that NumPy demonstrates is that such things are capable of becoming popular enough that they essentially become their own sub-language. One option in that case, if GoNum collected enough of a community, is to fork Go and add generics. There are some complicated generics options that would be difficult to use, but there's some simpler options that would work, and arguably "generics via templated code generation" is pretty much what you'd want for this use case anyhow since it gives the optimizers the most to work with. Said fork might also add some custom optimizations for this use case. I wouldn't want to deviate too far from core Go because I'd like to be able to keep pulling from that code base if at all possible, but some judicious work here might be a net positive.

howeman · on Sept 22, 2017

I can see the argument for operator overloading being necessary, but I don't understand the argument for generics. There's basically no time I've wanted generics coding Go, except a couple times with float64 vs []float64. I also see the need for float32 vs. float64, but that's a very small use case for generics in terms of scope (we can and do autogenerate float32 code).

There's a couple of cases with float64 vs. complex128 matrices, but I have been annoyed with those silent changes in Matlab where the answer is wrong but the code continues anyway.

howeman · on Sept 22, 2017

Sorry, just saw your thing below. I see your point about [2]float64 vs. [3]float64, but that still feels like mostly an operator overloading thing (I realize it isn't exclusivly). Most of the time I've dealt with that (say, [3][3]float64 vs. [2][2]float64) the contexts were different enough that generics would not have been useful because there would still have to be type switching.

chewxy · on Sept 21, 2017

I wrote Gorgonia and recently wrote a large piece on my thoughts on having generics in Go - https://blog.chewxy.com/2017/09/11/tensor-refactor/

Would love your thoughts on it

nerdponx · on Sept 22, 2017

One of the big perks of Julia comes from its built-in multiple dispatch. Overloading is one thing, but full-blown multiple dispatch is really powerful in a math context, where the concept of "multiplication" is entirely dependent on the types of things you are multiplying.

In Python (despite there being an excellent `multipledispatch` module) this is mostly just handled by aggressive duck typing ("if it has a .foo method, it's good enough"). In R it's handled with S4 classes, which are cool and kind of CLOS-like but are even slower than single dispatch.

So I guess my question is: why do you need generics when you have interfaces? These other (admittedly dynamically typed) languages make do without.

catnaroek · on Sept 22, 2017

For numerical computing, rather than dynamic multiple dispatch, what is actually desirable is a type system that can figure out statically the kinds of result produced by multiplying different kinds of arguments.

jerf · on Sept 22, 2017

"So I guess my question is: why do you need generics when you have interfaces? These other (admittedly dynamically typed) languages make do without."

Going backwards, as you allude to, dynamic languages fulfill the use cases for generics, as long as you don't care about type safety, which is a thing that is true for the whole language anyhow so it's not much to give up.

For Go, the main problem is that when you're trying to be mathematical, with interfaces you get the worst of both the static and the dynamic worlds. You might like to define an interface that lets you add two vectors, right?

    type Vector interface {
        Components() []float64
    }


    type Add interface {
        Add(Vector) Vector
    }

which might let you implement an Add method on something that is a Vector as well, but you don't get a satisfactory result from either perspective. From the static perspective you can not, using interfaces, guarantee that someone doesn't add a Vector3 to a Vector2, meaning you must either panic at run time or have Add potentially return an error (that will generally not be necessary to check if used correctly, which is not a pleasant error to work with). From the dynamic perspective, you have to remember that what comes out the other end of that operation is always an Add interface value, not a concrete type, so if you have a Vector2 and .Add(Vector2) to it, you don't get a concrete Vector2, you get a value of type "interface Add", which you have to manually cast back to a Vector2 if you want to do anything more than just keep adding to it.

You can make Vector2 have a distinct .Add(Vector2) method which does return a Vector2, but then if you also have a "func (v Vector3) Add(Vector3) Vector3" function, there is no way to declare an interface that both of those methods can meet, so you can not write any dimensionally-oblivious code that uses generic vector adding.

In "normal software engineering", Go's interface limitations are often not so bad, certainly not as bad as is often portrayed on HN. However, when you try to create a strongly-type numeric system (and you want it to be strongly-typed because that's also how you get good performance), Go's interface mechanism is basically worthless.

catnaroek · on Sept 22, 2017

> and you want it to be strongly-typed because that's also how you get good performance

What you get performance from is the absence of dynamic checks, not the presence of static ones. Of course, in the absence of dynamic checks, you want static ones for your sanity's sake - but not for performance's sake!

jerf · on Sept 22, 2017

I was speaking in the context of Go. In general, this is the sort of code that JITs are so good at handling that they tend to fool people into thinking they are miracle workers everywhere else where the JIT expense isn't being amortized across million-row matrix multiplications. But Go doesn't have a JIT, and its performance is good enough that I don't expect one to emerge any time soon. (Languages running 50x slower than C have a lot more pressure to try to solve that problem with a JIT than languages that are only 2-3x slower than C.)

sbinet · on Sept 22, 2017

> But Go doesn't have a JIT, and its performance is good enough that I don't expect one to emerge any time soon.

that's true. that is... until a (real) Go interpreter shows up. something that's bound to happen when Go will be used for (data) exploratory work.

jerf · on Sept 22, 2017

I poked around with writing a Go interpreter a while back. There are a number of issues that make it practically infeasible. You can get some hacked-up stuff off of GitHub, but those hacked up things are pretty much the best you can do right now.

But as per my other thread in this thread, if the scientific community becomes big enough I wouldn't be surprised they fork Go entirely, at which point that opens up a lot more options.

sbinet · on Sept 24, 2017

as I am working on https://github.com/go-interpreter/wagon, I'd be very interested in these issues you're talking about.

catnaroek · on Sept 22, 2017

Relying on a JIT to get good performance is arguably a bad thing anyway, since JIT compilation makes performance harder to predict. Of course, in the end you need to measure what you have, but you should also be able to make educated guesses about performance when you don't have something to measure, e.g., when you need to select between several alternative designs, and implementing all of them would be prohibitively expensive.

sythe2o0 · on Sept 21, 2017

Generics wouldn't necessarily bring in operator overloading, though. I haven't seen a Go proposal for generics that actually included it.

Recurecur · on Sept 22, 2017

I think Julia is a far better candidate for high performance numerics than Go. It's just a better designed language in general, it is already higher performance, and it's far more expressive than Go.

When the Julia AOT compilation story is complete, and it's well along now, Julia should dominate a whole lot of Go use cases...

egl2016 · on Sept 21, 2017

"By default, blas64 and lapack64 call the native Go implementations of the routines. Alternatively, it is possible to use C-based implementations of the APIs through the respective cgo packages and "Use" functions."

Performance comparison? Algorithmic equivalence? How close are the results numerically (e.g. how do they compare on badly conditioned matrices)?

howeman · on Sept 21, 2017

The algorithms are (basically) equivalent, and are translations from the Fortran (though row major instead of column major). As far as I know there are no major differences in the answers, though for extremely poorly conditioned matrices (1e14 or so) you shouldn't expect consistent answers across any implementation.

The performance story is complex. Typically we're the same speed on small matrices (and using Go is faster if you include the cgo overhead). We currently have significant speed penalties on large matrices (300x300 or so), but Kunde21 is working on assembly kernels for the BLAS functions to close that gap

openasocket · on Sept 21, 2017

I'm surprised your performance is anywhere near that of standard BLAS implementations. The Golang compiler doesn't have support for explicit SIMD or auto-vectorization, so that's a big performance gain just sitting there.

howeman · on Sept 21, 2017

For small vectors and matrices the cgo overhead swamps the assembly speedups. For large vectors cache misses dominate, and the assembly doesn't matter as much. It does matter significantly for medium vectors and large matrices. In that case we provide cgo wrappers and are working on SIMD kernels.

sbinet · on Sept 22, 2017

I have been using Gonum for some time now (also contributed, mostly in the plotting area).

Last summer, I tried an experiment: have a student migrate a little python-based analysis to a Go-based one. The analysis was fitting some cosmological constants out of the so called Hubble diagram.

I was pleased to see that, in the span of 2-3 months, the student who had limited knowledge in programming (a bit of python), managed to pull off the minimization of a 740 supernovae dataset with a 2220x2220 nuisance parameters matrix.

and the run time was 2x faster than the python one (with scipy/minuit for the minimization, so everything in C/C++, really).

success. :)

(and this motivated us to completely switch to Go as a teaching language for our master in particle physics / cosmology.)

Recurecur · on Sept 22, 2017

I suggest you take a look at Julia. I think it's a much better fit for that type of work...

pbnjay · on Sept 22, 2017

I can't wait till the Go team starts giving these packages more love from the performance perspective. Now that the compiler has an SSA backend we might start seeing more SIMD and other optimizations, but it's still a ways to go before performance is comparable to bare C libraries for heavy computation and tight inner loops.

brian-armstrong · on Sept 22, 2017

Honestly, I don't think that Go is the right language for this. I've used Go quite a lot and it feels like it mostly just gets in your way. You can't really do memory management, which will likely impede performance for numerical work. There's no operator overloading either.

C++, for all its flaws, seems to just generally be a more well-conceived language and more generalist than Go. The only place I really feel like Go works is specifically in the context of moving bytes from one socket to another.

d4l3k · on Sept 22, 2017

What kind of memory management do you need that Go doesn't provide?

It's not too hard to write Go to minimize allocations (and most short lived allocations end up on the stack anyways unlike other languages [1]). If you really need a lot of allocations you can always use https://golang.org/pkg/sync/#Pool to avoid GC overhead.

[1] https://groups.google.com/d/msg/golang-nuts/KJiyv2mV2pU/wdBU...

brian-armstrong · on Sept 22, 2017

But if you're doing that, you might as well just use a language with RAII, good scoping and unique pointers. A language like... C++

Khanthulhu · on Sept 21, 2017

What's the use case for this? Machine learning? But data? General math use?

jbochi · on Sept 21, 2017

Here at The New York Times we are using it to power some of our recommendation algorithms. We are actually training the models with Python and serving them with Go using gonum.

Our library was just open sourced (and still in my personal account, until we add more documentation): https://github.com/jbochi/facts

avyfain · on Sept 21, 2017

This sounds really cool. Anywhere I could read more about the Python -> Go integration? Or are you just exporting the raw weight matrices?

bochi · on Sept 21, 2017

We are just exporting the matrices. Nothing fancy.

Khanthulhu · on Sept 22, 2017

Beautiful. Thanks for the reply

howeman · on Sept 21, 2017

General math use, like numpy/scipy.

optimuspaul · on Sept 21, 2017

wonder how this compares to numpy/scipy in terms of features and performance. Looks pretty comprehensive.

howeman · on Sept 21, 2017

We aren't at full feature parity, but we're pretty close. There are some big things we are missing (ODE, FFT), and we have a bunch of things they don't have (statistical distance measures being one example). We are trying to be pure-go, so it's not at simple as providing a wrapper API. Working on it though!

dm319 · on Sept 23, 2017

statistical distance measures? is that like tSNE and similar?