EasyOpenCL – The easiest way to get started with GPU programming

scott_s · on Aug 22, 2015

There's actually quite a few of these kind of libraries floating around, although I'm not sure how many are still actively supported.

Thrust: http://thrust.github.io/

VexCL: http://ddemidov.github.io/vexcl/

Boost.Compute: http://boostorg.github.io/compute/

The author of VexCL provided a comparison of them two years ago: http://stackoverflow.com/questions/20154179/differences-betw...

oneofthose · on Aug 23, 2015

There is another library authored by me and some colleagues. It is called Aura: https://github.com/sschaetz/aura

I blogged about these kinds of libraries here (overview): http://www.soa-world.de/echelon/2014/04/c-accelerator-librar...

A new addition is welcome as we still have not found the perfect API for accelerator programming. EasyOpenCL seems very simple and easy to use but I feel like it is very restricted.

For getting started with OpenCL development these days I would recommend PyOpenCL. Since everything is in Python, data can be generated easily, results can be plotted using well known Python tools which simplified debugging. Kernels developed in PyOpenCL can directly be copied to other APIs (raw OpenCL C API or some of the other C/C++ wrappers) and reused in production code.

Polytonic · on Aug 24, 2015

I think the problem with (most of?) these libraries is that they don't solve the fundamental problem, which is: the OpenCL API is awful to work with. Various strategies have been attempted, mainly involving some form of "wrap the C API in less verbose C++!"

ginko · on Aug 23, 2015

There is also SYCL[1], which is backed by Khronos (the standards body behind OpenCL, OpenGL, Vulkan and others)

[1]https://www.khronos.org/sycl

dragandj · on Aug 23, 2015

I must shamelessly plug my own ClojureCL library for doing this sort of programming from Clojure.

http://clojurecl.uncomplicate.org

lfowles · on Aug 23, 2015

pocl seems to be another: http://portablecl.org/

paulmd · on Aug 23, 2015

Thrust is for CUDA. But there's also the Bolt framework for OpenCL.

EasyOpenCL sounds quite similar to these STL-style libraries.

bratsche · on Aug 23, 2015

What's the license on this? There doesn't seem to be anything about that in their Github repo.

Gladdyu · on Aug 23, 2015

I added the GPLv3 licence [1]: I didn't think it'd get this much attention to be honest and therefore the licensing didn't cross my mind.

[1] https://github.com/Gladdy/EasyOpenCL/commit/da59775e94b580d4...

nadams · on Aug 23, 2015

I really don't know why you were downvoted - as this is a somewhat important question.

Thought - by the github TOS you at least get to fork the repo [1].

[1] https://help.github.com/articles/github-terms-of-service/#f-...

wyldfire · on Aug 23, 2015

I find that it's easier to submit a PR than ask on HN. Usually omitting a license is an unintentional error.

nadams · on Aug 24, 2015

Why would you submit a PR - wouldn't you just create an issue (you don't want to select a license for their project)? In either case - my experiences with PRs have been really negative (and I'm not surprised).

wyldfire · on Aug 27, 2015

Meh, they just need a push in the right direction. If they didn't care to post a license in the first place they might not care which one I pick for them. But you're right, in this case they took the PR as an opportunity to select a license but decided not to take the one I suggested.

exDM69 · on Aug 23, 2015

This seems to be a library to make it really easy to invoke a single GPU kernel on some input buffers that are copied from CPU (an std::vector). Unfortunately, most practical GPGPU tasks aren't like that.

The latency of getting data from the CPU to the GPU and back is bad enough that for a small quantity of data (low megabytes), it's better just to compute it on the CPU. More practical tasks usually involve several kernel invocations, and keeping the data at the GPU is essential for any kind of decent performance.

But there are cases where executing a single kernel over some buffers would be useful (especially in early development or prototyping). In those cases, I'd like to write ZERO host-side code and use a CLI or GUI tool to run the code. So what I'd like to see is something like:

    $ cl-cli --kernel=frobnicate.cl --input0=foos.bin --input1=bars.bin --output0=bazs.bin

Does such a tool exist already?

It would be even better if this would allow building proper pipelines of multi-kernel programs by defining the inputs and outputs to kernels using a directed acyclic graph.

I do not intend to dishearten you, OP, but think about this when you consider future direction to take with your project.

matthiasv · on Aug 23, 2015

We build something like this for image processing tasks: https://github.com/ufo-kit/ufo-core

There is also a CLI interface that allows in principle what you want to do, e.g.

     $ ufo-launch read path=foos.bin ! opencl filename=frobnicate.cl kernel=frobnicate ! fft ! blur ! write filename=bars.tif

Gladdyu · on Aug 23, 2015

The framework allows for partial data updates - for instance for a 3D renderer it suffices the push the new position to the GPU whilst the vertex data remains in GPU memory. If you invoke the kernel function again it will not recompile the kernel nor reupload the vertex data.

The DAG idea sounds fun to build and very useful - I have some spare time anyway so I'll see what I can whip up. As for the command line interface - It too sounds pretty useful and it should only be a bit of parsing as all the OpenCL related code as been written already, but ufo-launch already performs pretty much the same function so it's not very high on my todo list.

exDM69 · on Aug 23, 2015

> The framework allows for partial data updates

Good! Additionally, it would be useful to memory map buffers and allow using raw pointers in addition to std::vectors. But more importantly, it would be necessary to use the output of one kernel as the input of another kernel invocation.

Anyway, build it to suit your use case primarily. Happy hacking!

Polytonic · on Aug 24, 2015

Sorry to keep plugging my stuff, but you might be interested in this ... https://github.com/Polytonic/Chlorine

No raw pointers, but you can use C arrays (was that what you meant by raw pointers?).

oneofthose · on Aug 23, 2015

Interesting idea. It should be only a few lines in PyOpenCL to build something like this.

But if you're already in PyOpenCL I guess would also prefer to generate the bin files there (maybe using numpy) ans evaluate the output (matplotlib possibly). For optimization you could run the kernel in a loop, time the runtime and vary the number or global and local work groups.

Polytonic · on Aug 22, 2015

I wrote something similar a while back (https://github.com/Polytonic/Chlorine). Always good to see more attention paid to OpenCL though!

gjulianm · on Aug 22, 2015

Seems nice! I would use this to avoid all the OpenCL bloat code. However, there's an inconvenient: why restrict the vector sizes to be all the same? I see that it is used to set the workgroup size. I think that giving the possibility to pass arrays of whatever size and allowing the client to set the workgroup size wouldn't add much complexity to the code nor to the API.

Apart from that, really nice work, the code is well written and commented, it's a joy reading things like that.

Gladdyu · on Aug 22, 2015

I still have some plans to auto-derive the optimal work/global/local group sizes, however, that still takes some work.

Therefore, I just implemented the most basic straightforward alternative (which is indeed rather restrictive at the moment) as a temporary solution.

gjulianm · on Aug 23, 2015

That's great! IIRC, OpenCL can calculate automatically the local workgroup sizes so you only have to provide the global size.

pen2l · on Aug 22, 2015

CUDA is probably the way to go, since especially if you're having to use a gpu, might as well get one of the new nvidia gpu's

Gladdyu · on Aug 23, 2015

CUDA has a nice toolchain, but the point of this is to remove all of the low-level stuff. For performance (even on NVIDIA cards) it doesn't really matter whether you use CUDA or OpenCL [1] + OpenCL runs everywhere as a bonus.

[1] http://pds.ewi.tudelft.nl/pubs/papers/icpp2011a.pdf

dr_zoidberg · on Aug 23, 2015

Added bonus: OpenCL runs on multicore processors, so you don't even need a GPU (though performance is obviosuly lower).

lfowles · on Aug 23, 2015

Not a whole lot lower though, only in the 10-100x lower range from what I've experimented with (couple generations old card vs couple generations old i7 with a straight up "process some floats" test[0]). Certainly enough to test your kernels against.

[0]: https://github.com/krrishnarraj/clpeak

dr_zoidberg · on Aug 23, 2015

Good work, specially on keeping the results from various platforms!

lfowles · on Aug 23, 2015

Not mine! Just the tool I used

TsiCClawOfLight · on Aug 23, 2015

No thanks, I'll rather use something that works on more than Nvidia. Better or not, that's too proprietary for me.