x
Loading
 Loading
Hello, Guest | Login | Register
Community » Users » mmccool
Recent Comments

Hi, I'd like to respond to a couple of the comments posted here. First, regarding the post by pdhackett: actually, the interface shown above is native C++ which is completely type-checked using normal mechanisms when you invoke your existing C++ compiler. RapidMind just provides some types and macros in a header file, using normal C++ semantics. There is no special preprocessor, as you were probably assuming, so what you see above is NOT ever turned into a string. RapidMind code IS ordinary, portable, ISO-standard C++ code. The C++ compiler *will* tell you if you have a malformed program. What actually happens is the RapidMind numerical types, like Value, are instrumented so the RapidMind platform (which is linked in like a library) can observe the sequence of operations that are applied to them. The BEGIN (which is just a macro wrapping a function call into the API) starts a "trace" of these operations, and END (another function call) stops the trace. In addition to this "retained" usage, Values also work in "immediate" mode as ordinary numerical types outside of BEGIN/END blocks. Immediate mode is handy for modifying non-local variables, as described below. Once a trace has been captured, at runtime the platform uses a staged code generator (completely separate from the C++ code generator) to construct some optimized machine language that reimplements that trace so it can run in parallel. "Program" objects are basically brand-new functions, built at runtime, that you can use to (asynchronously, as it happens) kick off a parallel version of the sequence of operations captured in the trace. In other words, the RapidMind platform interface described above adds the capability to C++ to dynamically construct parallelized functions in a safe way. What this means is: - C++ overhead is eliminated. Operations that are not on RapidMind types just act as "scaffolding" that organizes the sequence of numerical operations on RapidMind types. This scaffolding is completely ignored by the platform's code generator. You can use all the C++ modularity constructs like classes, namespaces, virtual member functions etc. that you want freely, then "compile them out". - The modularity and scope constructs of C++ get automatically transformed into interprocessor communication patterns. Non-local variables declared outside of a BEGIN/END work for Programs as you would expect for functions. This means that binding between code on the "host" and code on the "co-processors" follows the same scope rules as the rest of C++, although the implementation is significantly more involved internally: the co-processor code may be running on separate processors that may not even share the same memory space. RapidMind hides this complexity completely. - The "metaprogramming" approach enables some interesting alternative programming models, which can significantly reduce the size of code, without reducing performance, and in some cases even enhancing it significantly. For example, you can easily generate parameterized variants of functions programmatically, or variants that depend on data only known at runtime. In particular, you can turn interpreters into compilers trivially, which is a rather extreme form of overhead elimination and run-time dependency. It should be noted that the staged code generator, since it operates on small kernels, is very fast. Which addresses an earlier comment: "C++ is not an HPC language". Our approach lets you have your cake (modularity and abstraction) and eat it too (performance). We've seen practical examples of portable code that's 1/10 the size and far easier to understand, but nearly twice the performance as compared to a non-portable C implementation on the same hardware. Abstraction helps with some of the code size reduction, but the scope rules and the embedded approach get rid of all the annoying glue code as well. The "kernel" language IS the API. What you see above is IT. As for chitown76's comments: we have a lot of experience with the Cell too, and actually, code written this way can significantly outperform code written at a low level on the Cell, and with much less effort, because (a) you can try out more high-level optimizations faster (b) you can programmatically generate code with RapidMind that would be insane to try to build by hand (c) you can write parameterized code and then automatically search for the sweet spot for block sizes, loop unroll factors, etc. (d) the platform automates the more common optimizations, like double-buffering and prefetch, so they're always there, but are invisible and portable. I should emphasize that we are not limited by the semantics of C++ in our code generator: the RapidMind platform has its own, which is designed around use of vector operations. But if you really, really, really want to, and you are willing to break portability, you CAN use explicit asyncronous DMA transfers and assembly intrinsics through the interface. Most Cell SPU assembly instructions can be specified with a simple function call, for instance. Such drill-down is occasionally useful, but should only be done after profiling a more generic implementation, and should be hidden whenever possible inside a suitable abstraction (which our approach makes zero-cost). Also, often higher-level transformations to the algorithm and the data layout have a bigger impact on performance, and should not be neglected. Simple things should be easy. Difficult things should be possible. Michael McCool »
Recommended Stories

mmccool hasn't made any recommendations yet.
Tags

mmccool hasn't added any tags yet.