**Hash functions vs texture-based permutation tables**

I’ve started working on a GPU procedural content editor and renderer. A core component of this is a collection of GPU noise functions ( eg, Perlin, Cellular, Lattice etc… ). I’ve taken interest in texture-free noise implementations which use hash functions to generate the pseudo random signal rather than texture-based permutation tables. There are advantages and disadvantages for this approach. In short some are….

- Hash functions tend to have a higher ease of use and self-sufficiency. ( can drop directly into vertex and fragment shader code without additional texture binding )
- Hash functions are slower than texture based implementations on typical hardware but can run faster for massively parallel execution.
- Hash functions execute in constant time, where as texture based permutation tables can have varying execution times depending on texture cache thrashing. This has the effect of noise execution slowing down when sampled on large scales. ( eg, http://forums.nvidia.com/index.php?showtopic=51823 )

Additional information can be found on Stefan Gustavson’s webgl-noise wiki. https://github.com/ashima/webgl-noise/wiki

I know of two floating-point hash functions which have been used to generate procedural noise in fragment shaders on the GPU. These are…

- BlumBlumShub, as used in Marc Olano’s MNoise. http://www.cs.umbc.edu/~olano/papers/index.html#mNoise
- Quadratic Permutation Polynomials, as used in Stefan Gustavson’s and Ian McEwan’s work on webgl-noise. https://github.com/ashima/webgl-noise

I will explain and present optimized versions of each, but first a small note on the implementation of mod(a,b) in the fragment shader…

**Optimizing mod(a,b) for a constant denominator
**

The standard implementation of mod(a,b) would be something like this….

#####
float mod( float a, float b ) {
return a - floor( a / b ) * b;
}

As mentioned by Marc in the MNoise paper, if we have a constant denominator (b) then we can implement our own optimized divide-free mod specifically for that denominator. eg for 61…

#####
float mod61( float a ) {
return a - floor( a * ( 1.0 / 61.0 ) ) * 61.0;
}

Some compilers do this automatically for you during optimization. Others do not. So it should be done as a matter of course. Implementing mod explicitly like this also allows us to make additional optimizations depending on the context in which it is used.

A faster low quality version ( suffers from precision problems ) can be implemented as follows. It can be used when exact mod functionality is not necessary.

#####
float mod61_lq( float a ) {
return fract( a * ( 1.0 / 61.0 ) ) * 61.0;
}

**BlumBlumShub / MNoise**

( from wikipedia ) Blum Blum Shub (BBS) is a pseudorandom number generator proposed in 1986 by Lenore Blum, Manuel Blum and Michael Shub. It takes the form *x_n_plus_1 = mod( x_n^2, M )* , where M=pq is the product of two large primes p and q.

Marc points out that the product of two sufficiently large primes generates a number too large for some GPU’s to handle and so settles on the prime number 61 stating that acceptable results were obtained. This allows for implementations on 16bit or 24bit hardware. The function is used the same way as a permutation table of size 61.

Here is some GLSL code which implements the BBS hash function to calculate pseudo random 0.0->1.0 hash values for the 4 corners of a 2D integer grid cell.

#####
// implementation of the blumblumshub hash
// as described in MNoise paper http://www.cs.umbc.edu/~olano/papers/mNoise.pdf
vec4 BBS_coord_prepare(vec4 x) {
// return mod( x, 61.0 ); // unoptimized reference
return x - floor(x * ( 1.0 / 61.0 )) * 61.0;
}
vec4 BBS_permute(vec4 x) {
// return mod( x * x, 61.0 ); // unoptimized reference
return fract( x * x * ( 1.0 / 61.0 )) * 61.0;
}
vec4 BBS_permute_and_resolve(vec4 x) {
// return mod( x * x, 61.0 ) / 61.0; // unoptimized reference
return fract( x * x * ( 1.0 / 61.0 ) );
}
vec4 BBS_hash( vec2 gridcell ) {
// gridcell is assumed to be an integer coordinate
vec4 hash_coord = BBS_coord_prepare( vec4( gridcell.xy, gridcell.xy + 1.0.xx ) );
vec4 hash = BBS_permute( hash_coord.xzxz /* * 7.0 */ ); // * 7.0 will increase variance close to origin
return BBS_permute_and_resolve( hash + hash_coord.yyww );
}

It should be noted that we’re seeding the BBS stream with the coordinates of X and Y and performing only one or two permutations. By using it this way we’re visualizing the randomness of the BBS stream between consecutive seeds rather than the randomness of the BBS stream itself. Ok, with that said lets move on…

We make use of the low quality mod for the *BBS_permute* function. This is because we’re not after exact behavior, but rather in just getting “some randomness” from it. The low quality mod is good enough for this.

One observation is the BBS permutation has the nice ( but non-essential ) property in that the sequence will wrap around 61 and repeat itself. Ie, *BBS_permute( x ) == BBS_permute( 61 + x )*. It is non-essential because we clamp the 4 input coordinates (x,y,x+1,y+1) to the 0->60 range before beginning permutation.

Another observation is that it mirrors itself in the middle. Ie, *BBS_permute( x ) == BBS_permute( 61 – x )*. This is very undesirable as it creates visible artifacts in the noise, as can be seen in this example image.

It is possible that we could limit the sampling space to 0->30 to eliminate the mirroring artifact. ( ie, *BBS_coord_prepare* would perform a mod31 instead of mod61 ). But this would halve the already small domain from 60 to 30, and it can also be seen that the BBS hash still produces some very visible artifacts in the form of long repeating lines and patterns. Luckily both of these problems can be reduced by performing an additional permutation along the X axis. This additional permutation hides the mirroring and produces a final noise with much less visible artifacts.

#####
vec4 BBS_hash_hq( vec2 gridcell ) {
// gridcell is assumed to be an integer coordinate
vec4 hash_coord = BBS_coord_prepare( vec4( gridcell.xy, gridcell.xy + 1.0.xx ) );
vec4 hash = BBS_permute( hash_coord.xzxz /* * 7.0 */ ); // * 7.0 will increase variance close to origin
hash = BBS_permute( hash + hash_coord.yyww );
return BBS_permute_and_resolve( hash + hash_coord.xzxz );
}

NOTE: Marc makes note in the MNoise reference code that to eliminate the slow ramp up of values close to the origin the first permutation should multiply the input coordinate by a small prime number ( eg 3 or 7 ). I have not done this in either LQ or HQ versions because given other artifacts I have not felt it necessary. But I’ve left it in as a comment for preference.

**Quadratic Permutation Polynomial / WebGL-noise**

Stefan Gustavson and Ian McEwan have been doing some fantastic work with optimized texture-free noise. They have supplied textured and texture-free versions of Classic Perlin, Simplex Perlin and Cellular noise. Their work can be found here https://github.com/ashima/webgl-noise/ and here http://webstaff.itn.liu.se/~stegu/GLSL-cellular/. For the texture-free versions they use a quadratic permutation polynomial for the core hashing functionality.

A permutation polynomial is a function which can permute ( or shuffle ) a set of numbers over a specific range into a different order. So applying the permutation polynomial to any index over the specified range will generate a new index within the same range. Applying this to each index will generate a new ordering for the elements in the set. Permutation polynomials should provide a one-to-one mapping which is free from any clashing.

Stefan and Ian have chosen to use a quadratic permutation polynomial of the form *F(x) = (34x^2 + x) mod 289*. This means it can operate over the integers of 0->288, acting just like a permutation table. The shuffling is how the randomness is generated. They obtain a 0.0->1.0 floating point number from the resulting integer like so, *resolve(x) = fract( x * (1.0 / 41.0) )*. The (1.0/41.0) fraction results in a set of numbers quantized to 41 different values. I have opted to use (7.0/288.0) instead because it produces a more varied result at no extra cost.

Here is some GLSL code which implements the SGPP hash function to calculate pseudo random 0.0->1.0 hash values for the 4 corners of a 2D integer grid cell.

#####
//
// Permutation polynomial idea is from Stefan Gustavson's and Ian McEwan's work at...
// http://github.com/ashima/webgl-noise
// http://www.itn.liu.se/~stegu/GLSL-cellular
//
vec4 SGPP_coord_prepare(vec4 x) {
// return mod( x, 289.0 ); // unoptimized reference
return x - floor(x * ( 1.0 / 289.0 )) * 289.0;
}
vec4 SGPP_permute(vec4 x) {
// return mod( ( 34.0 * x + 1.0 ) * x , 289.0 ); // unoptimized reference
return fract( x * ( ( 34.0 / 289.0 ) * x + ( 1.0 / 289.0 ) ) ) * 289.0; // LQ mod
}
vec4 SGPP_resolve(vec4 x) {
const float K = 7.0 / 288.0;
return fract(x * K);
}
vec4 SGPP_hash( vec2 gridcell ) {
// gridcell is assumed to be an integer coordinate
vec4 hash_coord = SGPP_coord_prepare( vec4( gridcell.xy, gridcell.xy + 1.0.xx ) );
return SGPP_resolve( SGPP_permute( SGPP_permute( hash_coord.xzxz ) + hash_coord.yyww ) );
}

Just like the BBS permute function, the permutation polynomial will wrap around and repeat itself. Ie, *SGPP_permute( x ) == SGPP_permute( 289 + x )*. And again, it is non-essential because we clamp the 4 input coordinates (x,y,x+1,y+1) to the 0->288 range before beginning permutation.

Also like the BBS permute function we make use of the low quality mod for *SGPP_permute*. This is because we’re not after exact permutation polynomial behavior, but rather in just getting “some randomness” from it. The low quality mod is good enough for this.

Here is an image showing a 61×61 sampling of the SGPP hashing function ( 61×61 for comparison with BBS, the actual domain is 289×289 ). As can be seen there is good variance over the image as a whole, but there are small localized areas of repeating patterns. These can be either problematic or acceptable given the context in which it is used. ( eg, Lattice, Perlin or Cellular noise )

**A note on integer hashing**

I am targeting DirectX9 level hardware so have opted for floating point hashing functions. Modern cards can also perform integer hashing which in some cases produce superior results but will often still run slower than floating point operations.

Some resources can be found here…

http://umbcgaim.wordpress.com/2010/07/01/gpu-random-numbers/

http://www.burtleburtle.net/bob/hash/integer.html

http://forums.nvidia.com/index.php?showtopic=51823

Pingback: A fast and simple 32bit floating point hash function | briansharpe