Adding Opencl On Dev C++
here is my feeble attempt at learning OpenCL, please don't make fun of me too much
Dev-C can also be used in combination with Cygwin or any other GCC based compiler. Here is a short tutorial that was pointed to me to setup Dev C with OpenGL. The tutorial is from the collection of Programming Tutorials and Lecture Notes from the Computer Science Department of Central Connecticut State University. May 19, 2014 SYCL offers simple abstractions to core OpenCL features. Rather than just putting C classes on top of OpenCL objects, these abstractions have been designed with C and Object Oriented programming paradigms in mind. The snippet shown below illustrates the implementation of the three-way vector addition using SYCL.
Configuration
On this page you will API references, tutorials, online resources, documentation downloads, etc. About C/C and Win32 programming. However, if you're looking for documentation about an add-on library, you should look at the page of that library. To ensure that OpenCL code is portable to many devices the default way to run kernels is with just-in-time, or JIT, compilation. We must prepare the source code for the device(s) in a given context. First we create our program, which is a set of kernel code, and then from that program we create the individual kernels.
This code uses OpenCL 1.1 on a NVIDIA GPU.
Linux
(Only tested on Ubuntu). For NVIDIA GPUs, I've installed the following packages: nvidia-346 nvidia-346-dev nvidia-346-uvm nvidia-libopencl1-346 nvidia-modprobe nvidia-opencl-icd-346 nvidia-settings
. Since the opencl-headers
package in the main repository is for OpenCL 1.2, you can get the OpenCL 1.1 header files from here.
Then to compile the C++ code:
To compile the C code:
For examples 04 and 05, you can run
OS X
OpenCL is installed on OS X by default, but since this code uses the C++ bindings, you'll need to get that too. Get the official C++ bindings from the OpenCL registr and copy it to the OpenCL framework directory, or do the following:
To compile:
Windows
For some reason, the makefile didn't want to work for Windows. I have no idea why.
For example 04, run (inside the directory):
where PATH/TO/CLFFT
is the path to the clFFT library.
For example 05, run (inside the directory):
where PATH/TO/FFTW
is the path to the FFTW3 library.
example 00
this example is based off of this example (example-ception), but it goes a bit further. In the blogspot example, two 10-element vectors are created and a thread is used for each pair of elements. In this example, 10 threads are spawned but two 100-element vectors are used, and it is shown how to split up a specific number of elements per thread.
example 01
Measures the duration of adding two vectors. See the README in the folder for more details.
example 02
Cooking sikhi haan ji audio song download. Demonstrates that one array can be modified several times without having to re-read and re-write data to and from the GPU.
example 03
A simple example using the cl_khr_fp64
extension which allows for usage of doubles instead of floats.
example 04
An example of the CLFFT library for an in-place complex-planar transform. There is also Python code to check the answer; FFTW code will be added later, probably.
- clFFT is required; installation instructions can be found inside example04/README.md
- for Python, numpy and scipy are required
example 05
Another CLFFT example where an in-place real transform and an out-of-place real transform are performed. There's also FFTW code and Python code for checking the answer.
- clFFT is required; installation instructions can be found inside example04/README.md
- FFTW is required; installation is as simple as extracting FFTW's tar file, then running
./configure && sudo make && sudo make install
- for Python, numpy and scipy are required
Some Notes
From the guide on programming OpenCL for NVIDIA:
- CUDA streaming multiprocessor corresponds to an OpenCL compute unit
- CUDA thread corresponds to an OpenCL work-item
- CUDA thread block corresponds to an OpenCL work-group
On Mac OS X 10.9 Maverics I couldn't get C++ OpenCL code run. I guess OpenCL on Mac OS X doens't have C++ Wrapper, so I manually added the wrapper header to fix the issue. I followed the steps below to make it work. |
- Download cl.hpp from http://www.khronos.org/registry/cl/api/1.1/cl.hpp |
- Move the downloaded cl.hpp file inside /System/Library/Frameworks/OpenCL.framework/Headers/ directory. |
- When including in C++ add #include <OpenCL/cl.hpp> instead of #include <OpenCL/opencl.h> |
#include<iostream> |
#define__NO_STD_VECTOR// Use cl::vector instead of STL version |
#define__CL_ENABLE_EXCEPTIONS |
#ifdef __APPLE__ |
//#include <OpenCL/opencl.h> |
#include<OpenCL/cl.hpp>/* read cpp_wrapper_fix.txt */ |
#else |
#include<CL/cl.hpp> |
#endif |
intmain(int, char**) { |
cl::vector<cl::Platform> platforms; |
cl::Platform::get(&platforms); |
int platform_id = 0; |
int device_id = 0; |
std::cout << 'Number of Platforms: ' << platforms.size() << std::endl; |
for(cl::vector<cl::Platform>::iterator it = platforms.begin(); it != platforms.end(); ++it){ |
cl::Platform platform(*it); |
std::cout << 'Platform ID: ' << platform_id++ << std::endl; |
std::cout << 'Platform Name: ' << platform.getInfo<CL_PLATFORM_NAME>() << std::endl; |
std::cout << 'Platform Vendor: ' << platform.getInfo<CL_PLATFORM_VENDOR>() << std::endl; |
cl::vector<cl::Device> devices; |
platform.getDevices(CL_DEVICE_TYPE_GPU CL_DEVICE_TYPE_CPU, &devices); |
for(cl::vector<cl::Device>::iterator it2 = devices.begin(); it2 != devices.end(); ++it2){ |
cl::Device device(*it2); |
std::cout << 'tDevice ' << device_id++ << ': ' << std::endl; |
std::cout << 'ttDevice Name: ' << device.getInfo<CL_DEVICE_NAME>() << std::endl; |
std::cout << 'ttDevice Type: ' << device.getInfo<CL_DEVICE_TYPE>(); |
std::cout << ' (GPU: ' << CL_DEVICE_TYPE_GPU << ', CPU: ' << CL_DEVICE_TYPE_CPU << ')' << std::endl; |
std::cout << 'ttDevice Vendor: ' << device.getInfo<CL_DEVICE_VENDOR>() << std::endl; |
std::cout << 'ttDevice Max Compute Units: ' << device.getInfo<CL_DEVICE_MAX_COMPUTE_UNITS>() << std::endl; |
std::cout << 'ttDevice Global Memory: ' << device.getInfo<CL_DEVICE_GLOBAL_MEM_SIZE>() << std::endl; |
std::cout << 'ttDevice Max Clock Frequency: ' << device.getInfo<CL_DEVICE_MAX_CLOCK_FREQUENCY>() << std::endl; |
std::cout << 'ttDevice Max Allocateable Memory: ' << device.getInfo<CL_DEVICE_MAX_MEM_ALLOC_SIZE>() << std::endl; |
std::cout << 'ttDevice Local Memory: ' << device.getInfo<CL_DEVICE_LOCAL_MEM_SIZE>() << std::endl; |
std::cout << 'ttDevice Available: ' << device.getInfo< CL_DEVICE_AVAILABLE>() << std::endl; |
} |
std::cout<< std::endl; |
} |
} |
Opencl 2.0
UNAME_S := $(shell uname -s) |
# -std=c++11 -Wall -march=native |
ifeq ($(UNAME_S),Linux) |
CXX=clang++ |
CPPFLAGS=-O3 |
LDFLAGS=-O3 |
LDLIBS=-lOpenCL |
endif |
ifeq ($(UNAME_S),Darwin) |
CXX=clang++ |
CPPFLAGS=-O3 |
LDFLAGS=-O3 |
LDLIBS=-framework OpenCL |
endif |
RM=rm -f |
SRCS=device_query.cpp |
OBJS=device_query.o |
EXEC=device_query |
all: $(OBJS) |
$(CXX)$(LDFLAGS) -o $(EXEC)$(OBJS)$(LDLIBS) |
%.o: %.cpp |
$(CXX)$(CFLAGS)$(CPPFLAGS) -c $< |
clean: |
$(RM)$(OBJS)$(EXEC) |
dist-clean: |
$(RM)$(EXEC) |