## GitHub code repository

I always like it when people provide clean implementations of their work which can be dropped easily into a project without much programmer effort.  Stefan Gustavson and Ian McEwan have done this at webgl-noise so I’ve decided to take their lead and do the same.  I’ve created a GitHub repository which can be found here https://github.com/BrianSharpe and also by clicking the “code” button in the blog menu bar above.

The first commit has been of the work so far…

• BBS, SGPP and FastHash32 hashing implementations
• C1, C2 and C3 interpolation and falloff functions

Also included are some optimal texture-free 2D + 3D Value, Perlin and ValuePerlin noise implementations.

2D+3D sampling of Value Noise

2D+3D sampling of Perlin Noise ( gradient noise )

2D+3D sampling of ValuePerlin Noise ( value-gradient noise )

## A fast and simple 32bit floating point hash function

In a previous post I reviewed two existing floating point hash functions which are suitable for use in fragment shaders.  These were the BlumBlumShub and Permutation Polynomial hash functions used in MNoise and WebGLNoise.  If curious, here is a link to the post https://briansharpe.wordpress.com/2011/10/01/gpu-texture-free-noise/

I’ve become disheartened with both.  They both contain artifacts and are slow to calculate.  The BBS hash is still useful because it can run on 16bit and 24bit floating point hardware.  But now that we have 32bit floating point hardware I’ve decided make a floating point hashing function of my own.

The hash function which is working best for me takes the form hash = mod( coord.x * coord.x * coord.y * coord.y, SOMELARGEFLOAT ) / SOMELARGEFLOAT.  The 3D version simply offsets the SOMELARGEFLOAT value by a fraction of the Z coordinate.  It takes some time to find constants which give good visual results and also to find a specific area of the noise which is most free from artifacts.  It would be good to one day use some kind of computational process to find constants which give the best possible noise, but for now my hand-picked values should work ok.

One thing to note is the x*x*y*y operation pushes the floating point number well over 2^23 causing a loss of precision, which is where a lot of the “randomness” comes from.  Because of this I’m not sure if it will give identical results across cards.  More investigation is required.  But for now its giving good results on my CPU and ATI card so I’m going to stick with it.  : )

As with the previous hash functions, here is some GLSL code which implements the new 32bit hash function to calculate pseudo random 0.0->1.0 hash values for the 4 corners of a 2D integer grid cell.

##### ```// //    FAST_32_hash //    A very fast 2D hashing function.  Requires 32bit support. // //    The hash formula takes the form.... //    hash = mod( coord.x * coord.x * coord.y * coord.y, SOMELARGEFLOAT ) / SOMELARGEFLOAT //    We truncate and offset the domain to the most interesting part of the noise. // vec4 FAST_32_hash( vec2 gridcell ) { //    gridcell is assumed to be an integer coordinate const vec2 OFFSET = vec2( 26.0, 161.0 ); const float DOMAIN = 71.0; const float SOMELARGEFLOAT = 951.135664; vec4 P = vec4( gridcell.xy, gridcell.xy + 1.0.xx ); P = P - floor(P * ( 1.0 / DOMAIN )) * DOMAIN;    //    truncate the domain P += OFFSET.xyxy;                                //    offset to interesting part of the noise P *= P;                                          //    calculate and return the hash return fract( P.xzxz * P.yyww * ( 1.0 / SOMELARGEFLOAT.x ).xxxx ); } ```

Here is an image showing a 61×61 sampling of the Fast 32bit hashing function  ( 61×61 for comparison with BBS and SGPP, the actual domain is 72×72 ).

A 61×61 sampling of the Fast 32bit floating point hash function

Posted in Uncategorized | 4 Comments

## Useful Interpolation Functions for Noise Development

Interpolation functions allow us to calculate a transition between discreet points.  They are used in noise development.   Here are some common types.

Smooth interpolation of a linear input from 0.0 to 1.0

This is a very common interpolation function.  It calculates a smooth transition from 0.0 to 1.0 as the input increases linearly from 0.0 to 1.0.  By smooth I mean the gradient is 0.0 at 0.0, 0.0 at 1.0 and positive in between.  An example is the GLSL function SmoothStep().

Ken Perlin uses a cubic hermite curve in his original classic noise.  It is of the form f(x) = 3x^2-2x^3.  A problem is that the 2nd derivative is non-zero at 0.0 and 1.0.  This can result in a sharp transition when the noise is used to calculate a normal for shading.  Ken solved the problem with a quintic hermite curve of the form f(x) = 6x^5-15x^4+10x^3.  This new function has both 1st and 2nd derivatives of 0.0 at 0.0 and 1.0.  More details can be found in his paper.  http://mrl.nyu.edu/~perlin/paper445.pdf

The cubic curve is said to be 1st derivative continuous ( C1 for short ).  The quintic curve is said to be 1st and 2nd derivative continuous ( C2 for short ).

Here are 4 implementations of this interpolation function.  The first two are the previously mentioned cubic and quintic curves.  The third is a more efficient C2 interpolation function which contains 1 less multiply than Perlin’s quintic curve.  The forth is a C3 interpolation function.  Their graph can be seen here.

##### ```// //    Interpolation functions //    ( smoothly increase from 0.0 to 1.0 as x increases linearly from 0.0 to 1.0 ) // //    Cubic Hermine Curve.  Same as SmoothStep().  As used by Perlin in Original Noise. //    3x^2-2x^3 float Interpolation_C1( float x ) { return x * x * (3.0 - 2.0 * x); } //    Quintic Hermite Curve.  As used by Perlin in Improved Noise.  http://mrl.nyu.edu/~perlin/paper445.pdf //    6x^5-15x^4+10x^3 float Interpolation_C2( float x ) { return x * x * x * (x * (x * 6.0 - 15.0) + 10.0); } //    Faster than Perlin Quintic.  Not quite as good shape. //    7x^3-7x^4+x^7 float Interpolation_C2_Fast( float x ) { float x3 = x*x*x; return ( 7.0 + ( x3 - 7.0 ) * x ) * x3; } //    C3 Interpolation function.  If anyone ever needs it... : ) //    25x^4-48x^5+25x^6-x^10 float Interpolation_C3( float x ) { float xsq = x*x; float xsqsq = xsq*xsq; return xsqsq * ( 25.0 - 48.0 * x + xsq * ( 25.0 - xsqsq ) ); } ```

Smooth falloff of a squared input from 0.0 to 1.0

A smooth falloff function is a function which decreases from 1.0 to 0.0 as its input increases from 0.0 to 1.0.  To say that it operates on x-squared means that it accepts a squared input rather than a linear one.  They can be used to efficiently fade out from a point due to the avoidance of a sqrt() call.

Listed here are two which are commonly used.  A graph of the functions can be seen here.  Included in the comments are some smooth interpolating functions also defined in x-squared.  Their graph can be seen here.

##### ```// //    Falloff defined in XSquared //    ( smoothly decrease from 1.0 to 0.0 as xsq increases from 0.0 to 1.0 ) // // ( 1.0 - x*x )^2 ( Used by Humus for lighting falloff in Just Cause 2. GPUPro 1 ) float Falloff_Xsq_C1( float xsq ) { xsq = 1.0 - xsq; return xsq*xsq; } // ( 1.0 - x*x )^3. NOTE: 2nd derivative is 0.0 at x=1.0, but non-zero at x=0.0 float Falloff_Xsq_C2( float xsq ) { xsq = 1.0 - xsq; return xsq*xsq*xsq; } // //  For interest here are some smooth interpolation functions defined in x-squared // //  C1:      f(x) = 1.0 - ( 1.0 - x*x )^2 //  C2:      f(x) = 9x^4-16x^6+9x^8-x^12 //  C2 Fast: f(x) = 5x^4-5x^6+x^10  or  f(x) = 3x^4-3x^8+x^12 //  C3:      f(x) = 10x^4-20x^6+15x^8-4x^10 // ```
Posted in Uncategorized | 1 Comment

## GPU Texture-Free Noise

Hash functions vs texture-based permutation tables

I’ve started working on a GPU procedural content editor and renderer.  A core component of this is a collection of GPU noise functions ( eg, Perlin, Cellular, Lattice etc… ).  I’ve taken interest in texture-free noise implementations which use hash functions to generate the pseudo random signal rather than texture-based permutation tables.  There are advantages and disadvantages for this approach.  In short some are….

• Hash functions tend to have a higher ease of use and self-sufficiency.  ( can drop directly into vertex and fragment shader code without additional texture binding )
• Hash functions are slower than texture based implementations on typical hardware but can run faster for massively parallel execution.
• Hash functions execute in constant time, where as texture based permutation tables can have varying execution times depending on texture cache thrashing.  This has the effect of noise execution slowing down when sampled on large scales.  ( eg, http://forums.nvidia.com/index.php?showtopic=51823 )

Additional information can be found on Stefan Gustavson’s webgl-noise wiki.  https://github.com/ashima/webgl-noise/wiki

I know of two floating-point hash functions which have been used to generate procedural noise in fragment shaders on the GPU.  These are…

I will explain and present optimized versions of each, but first a small note on the implementation of mod(a,b) in the fragment shader…

Optimizing mod(a,b) for a constant denominator

The standard implementation of mod(a,b) would be something like this….

##### ```float mod( float a, float b ) { return a - floor( a / b ) * b; } ```

As mentioned by Marc in the MNoise paper, if we have a constant denominator (b) then we can implement our own optimized divide-free mod specifically for that denominator. eg for 61…

##### ```float mod61( float a ) { return a - floor( a * ( 1.0 / 61.0 ) ) * 61.0; } ```

Some compilers do this automatically for you during optimization.  Others do not.  So it should be done as a matter of course.  Implementing mod explicitly like this also allows us to make additional optimizations depending on the context in which it is used.

A faster low quality version ( suffers from precision problems ) can be implemented as follows.  It can be used when exact mod functionality is not necessary.

##### ```float mod61_lq( float a ) { return fract( a * ( 1.0 / 61.0 ) ) * 61.0; } ```

BlumBlumShub / MNoise

( from wikipedia ) Blum Blum Shub (BBS) is a pseudorandom number generator proposed in 1986 by Lenore Blum, Manuel Blum and Michael Shub.  It takes the form  x_n_plus_1 = mod( x_n^2, M ) , where M=pq is the product of two large primes p and q.

Marc points out that the product of two sufficiently large primes generates a number too large for some GPU’s to handle and so settles on the prime number 61 stating that acceptable results were obtained.  This allows for implementations on 16bit or 24bit hardware.  The function is used the same way as a permutation table of size 61.

Here is some GLSL code which implements the BBS hash function to calculate pseudo random 0.0->1.0 hash values for the 4 corners of a 2D integer grid cell.

##### ```// implementation of the blumblumshub hash // as described in MNoise paper http://www.cs.umbc.edu/~olano/papers/mNoise.pdf vec4 BBS_coord_prepare(vec4 x) { // return mod( x, 61.0 ); // unoptimized reference return x - floor(x * ( 1.0 / 61.0 )) * 61.0; } vec4 BBS_permute(vec4 x) { // return mod( x * x, 61.0 ); // unoptimized reference return fract( x * x * ( 1.0 / 61.0 )) * 61.0; } vec4 BBS_permute_and_resolve(vec4 x) { // return mod( x * x, 61.0 ) / 61.0; // unoptimized reference return fract( x * x * ( 1.0 / 61.0 ) ); } vec4 BBS_hash( vec2 gridcell ) { //    gridcell is assumed to be an integer coordinate vec4 hash_coord = BBS_coord_prepare( vec4( gridcell.xy, gridcell.xy + 1.0.xx ) ); vec4 hash = BBS_permute( hash_coord.xzxz /* * 7.0 */ ); // * 7.0 will increase variance close to origin return BBS_permute_and_resolve( hash + hash_coord.yyww ); } ```

It should be noted that we’re seeding the BBS stream with the coordinates of X and Y and performing only one or two permutations.  By using it this way we’re visualizing the randomness of the BBS stream between consecutive seeds rather than the randomness of the BBS stream itself.  Ok, with that said lets move on…

We make use of the low quality mod for the BBS_permute function.  This is because we’re not after exact behavior, but rather in just getting “some randomness” from it.  The low quality mod is good enough for this.

One observation is the BBS permutation has the nice ( but non-essential ) property in that the sequence will wrap around 61 and repeat itself.  Ie,  BBS_permute( x ) == BBS_permute( 61 + x ).  It is non-essential because we clamp the 4 input coordinates (x,y,x+1,y+1) to the 0->60 range before beginning permutation.

Another observation is that it mirrors itself in the middle.  Ie, BBS_permute( x ) == BBS_permute( 61 – x ).  This is very undesirable as it creates visible artifacts in the noise, as can be seen in this example image.

BBS low quality 61×61

It is possible that we could limit the sampling space to 0->30 to eliminate the mirroring artifact.  ( ie,  BBS_coord_prepare would perform a mod31 instead of mod61 ).  But this would halve the already small domain from 60 to 30, and it can also be seen that the BBS hash still produces some very visible artifacts in the form of long repeating lines and patterns.  Luckily both of these problems can be reduced by performing an additional permutation along the X axis.  This additional permutation hides the mirroring and produces a final noise with much less visible artifacts.

##### ```vec4 BBS_hash_hq( vec2 gridcell ) { //    gridcell is assumed to be an integer coordinate vec4 hash_coord = BBS_coord_prepare( vec4( gridcell.xy, gridcell.xy + 1.0.xx ) ); vec4 hash = BBS_permute( hash_coord.xzxz /* * 7.0 */ ); // * 7.0 will increase variance close to origin hash = BBS_permute( hash + hash_coord.yyww ); return BBS_permute_and_resolve( hash + hash_coord.xzxz ); } ```

BBS high quality 61×61

NOTE:  Marc makes note in the MNoise reference code that to eliminate the slow ramp up of values close to the origin the first permutation should multiply the input coordinate by a small prime number ( eg 3 or 7 ).  I have not done this in either LQ or HQ versions because given other artifacts I have not felt it necessary.  But I’ve left it in as a comment for preference.

Stefan Gustavson and Ian McEwan have been doing some fantastic work with optimized texture-free noise.  They have supplied textured and texture-free versions of Classic Perlin, Simplex Perlin and Cellular noise.  Their work can be found here https://github.com/ashima/webgl-noise/ and here http://webstaff.itn.liu.se/~stegu/GLSL-cellular/.  For the texture-free versions they use a quadratic permutation polynomial for the core hashing functionality.

A permutation polynomial is a function which can permute ( or shuffle ) a set of numbers over a specific range into a different order.  So applying the permutation polynomial to any index over the specified range will generate a new index within the same range.  Applying this to each index will generate a new ordering for the elements in the set.  Permutation polynomials should provide a one-to-one mapping which is free from any clashing.

Stefan and Ian have chosen to use a quadratic permutation polynomial of the form F(x) = (34x^2 + x) mod 289.  This means it can operate over the integers of 0->288, acting just like a permutation table.  The shuffling is how the randomness is generated.  They obtain a 0.0->1.0 floating point number from the resulting integer like so, resolve(x) = fract( x * (1.0 / 41.0) ).  The (1.0/41.0) fraction results in a set of numbers quantized to 41 different values.  I have opted to use (7.0/288.0) instead because it produces a more varied result at no extra cost.

Here is some GLSL code which implements the SGPP hash function to calculate pseudo random 0.0->1.0 hash values for the 4 corners of a 2D integer grid cell.

##### ```// //    Permutation polynomial idea is from Stefan Gustavson's and Ian McEwan's work at... //    http://github.com/ashima/webgl-noise //    http://www.itn.liu.se/~stegu/GLSL-cellular // vec4 SGPP_coord_prepare(vec4 x) { // return mod( x, 289.0 ); // unoptimized reference return x - floor(x * ( 1.0 / 289.0 )) * 289.0; } vec4 SGPP_permute(vec4 x) { // return mod( ( 34.0 * x + 1.0 ) * x , 289.0 ); // unoptimized reference return fract( x * ( ( 34.0 / 289.0 ) * x + ( 1.0 / 289.0 ) ) ) * 289.0; // LQ mod } vec4 SGPP_resolve(vec4 x) { const float K = 7.0 / 288.0; return fract(x * K); } vec4 SGPP_hash( vec2 gridcell ) { //    gridcell is assumed to be an integer coordinate vec4 hash_coord = SGPP_coord_prepare( vec4( gridcell.xy, gridcell.xy + 1.0.xx ) ); return SGPP_resolve( SGPP_permute( SGPP_permute( hash_coord.xzxz ) + hash_coord.yyww ) ); } ```

Just like the BBS permute function, the permutation polynomial will wrap around and repeat itself.  Ie,  SGPP_permute( x ) == SGPP_permute( 289 + x ).  And again, it is non-essential because we clamp the 4 input coordinates (x,y,x+1,y+1) to the 0->288 range before beginning permutation.

Also like the BBS permute function we make use of the low quality mod for SGPP_permute.  This is because we’re not after exact permutation polynomial behavior, but rather in just getting “some randomness” from it.  The low quality mod is good enough for this.

Here is an image showing a 61×61 sampling of the SGPP hashing function  ( 61×61 for comparison with BBS, the actual domain is 289×289 ).  As can be seen there is good variance over the image as a whole, but there are small localized areas of repeating patterns.  These can be either problematic or acceptable given the context in which it is used.  ( eg, Lattice, Perlin or Cellular noise )

61×61 sampling of SGPP hash function

A note on integer hashing

I am targeting DirectX9 level hardware so have opted for floating point hashing functions.  Modern cards can also perform integer hashing which in some cases produce superior results but will often still run slower than floating point operations.

Posted in Uncategorized | 1 Comment

## Dev blogs are cooler than twitter

I’ve always liked developer blogs.  They provide a great repository and sharing facility for those small-but-often-very-useful code gems that arise from daily work.

After 10 years of coding I thought it was finally time I started a developer blog of my own.

So here it is.  I’m guessing nobody will ever read it, but I’m going to write it anyway 🙂

Posted in Uncategorized | 3 Comments