In the last week, I’ve been working on optimizing E15 for speed and efficiency. So far I’ve implemented two ways to increase performance: texture tiling and frustrum culling. I’ll start with texture tiling.
Texture Tiling
We were using non power-of-two (POT) textures in E15, since they are supported, and it makes everything easy since source images for our textures don’t necessarily come in POT. For most images, this is fine, since they are small and manageable. With OpenGL 2.1, most things work with non POT. Performance issues arise when you have large textures, in our case rendering web pages. When web pages get turned into a bitmap, they become huge. Blogs are especially large, easily reach 15,000 pixels high. Of course these textures are too large for OpenGL, and so we decided to go back to POT and tile the images by subdividing them in multiple textures applied to multiple quads.
Going back to POT was a good move, since on the ATI X1900 it seems hardware mipmaps are only supported with POT (so originally we where using gluBuildMipmap). Implementing this was relatively straight forward. Here’s what needs to get done:
- Obtain texture, then create a new image with the next largest POT dimension.
- Create new image by placing original image onto the new image.
- Create textures by subdividing image with predefined tile size.
All images are supplied as a CGImageRef, so I implemented a new method that will go through and accomplish the above task. It is pretty simple. You pass a CGImageRef and tile size and it will return an array of OpenGL texture ids.
- (GLuint *)createTiledTexturesFromCGImage:(CGImageRef)cgImage
tileSize:(int)newTileSize
{
GLuint *textureNames;
if(cgImage) {
float image_w = CGImageGetWidth(cgImage);
float image_h = CGImageGetHeight(cgImage);
float remain_x = image_w/newTileSize;
float remain_y = image_h/newTileSize;
float spacing_w = (ceil(remain_x)-remain_x)*(float)newTileSize;
float spacing_h = (ceil(remain_y)-remain_y)*(float)newTileSize;
float width = image_w + spacing_w;
float height = image_h + spacing_h;
void* tData = calloc(width * 4, height);
CGRect rect = CGRectMake(0, spacing_h, image_w, image_h);
CGColorSpaceRef color_space = CGColorSpaceCreateDeviceRGB();
CGContextRef myBitmapContext = CGBitmapContextCreate(
tData, width, height, 8, width*4, color_space,
kCGImageAlphaPremultipliedFirst);
CGContextDrawImage(myBitmapContext, rect, cgImage);
int perWidth = (int)ceil(width/newTileSize);
int perHeight = (int)ceil(height/newTileSize);
int numTextureNames = perWidth*perHeight;
textureNames = malloc(sizeof(GLuint)*numTextureNames);
textureType = GL_TEXTURE_2D;
glEnable(textureType);
glGenTextures(numTextureNames, textureNames);
//backup default pixel store state
glPushClientAttrib(GL_CLIENT_PIXEL_STORE_BIT);
//setup bitmap attributes
glPixelStorei(GL_UNPACK_ROW_LENGTH, width);
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
int onY;
for(onY = 0; onY < perHeight; onY++) {
int onX;
for(onX = 0; onX < perWidth; onX++) {
int onTexture = onY*perWidth + onX;
//setup offsets
int x = onX*newTileSize;
int y = onY*newTileSize;
//setup extents
int dx = MINOF2(width-x, newTileSize);
int dy = MINOF2(height-y, newTileSize);
//skip to x,y for read from bitmap
glPixelStorei(GL_UNPACK_SKIP_PIXELS, x);
glPixelStorei(GL_UNPACK_SKIP_ROWS, y);
glBindTexture(textureType, textureNames[onTexture]);
glTexParameteri(textureType, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(textureType, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(textureType, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(textureType, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(textureType, GL_TEXTURE_BASE_LEVEL, 0);
glTexParameteri(textureType, GL_TEXTURE_MAX_LEVEL, 4);
glTexParameteri(textureType, GL_TEXTURE_MIN_LOD, 0);
glTexParameteri(textureType, GL_TEXTURE_MAX_LOD, 4);
glTexParameteri(textureType, GL_GENERATE_MIPMAP, GL_TRUE);
glTexImage2D(textureType, 0, GL_RGBA, newTileSize,
newTileSize, 0, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8, NULL);
glTexSubImage2D(textureType, 0, 0, 0, dx, dy, GL_BGRA,
GL_UNSIGNED_INT_8_8_8_8, tData);
}
}
//restore default pixel store state
glPopClientAttrib();
glDisable(textureType);
// release
CGColorSpaceRelease(color_space);
CGContextRelease(myBitmapContext);
free(tData);
}
return textureNames;
}
We use glTexImage2D with NULL data and instead use glTexSubImage2D to insert an image of size dx, dy to account for the images at the edges. I’m not sure if that was necessary. Now all we need to do is iterate through the textures and create the necessary quads in our scene. Initially, I had rendered the quad size to be the texture size (which is the tileSize) but many times the quad sizes are too big and had rendering quirks with overlapping quads. The solution is to make sure you size the quad to be the same size as the original image. So for edge textures, you would have not create square quads, instead you will have whatever size necessary to show the original image. Here’s a code snippet from the scene:
unsigned i = 0;
float x, y;
for (y = 0; y > -dob.h; y -= textureSize) {
float dy = MINOF2(dob.h + y, textureSize);
glPushMatrix();
glTranslatef(0, 2*y/mapScaler, 0);
for (x = 0; x < dob.w; x += textureSize) {
glPushMatrix();
glTranslatef(2*x/mapScaler, 0, 0);
if (dob.textureIds[i]) {
if (renderMode == GL_SELECT) {
glLoadName(j);
}
float dx = MINOF2(dob.w - x, textureSize);
glBindTexture(GL_TEXTURE_2D, dob.textureIds[i]);
glBegin(GL_QUADS);
//Page textures are flipped. Compensate for that.
glTexCoord2f(0.0f, 0.0f);
glVertex3f(0.0f, 0.0f, 0.0f);
glTexCoord2f(dx/textureSize, 0.0f);
glVertex3f(2*dx/mapScaler, 0.0f, 0.0f);
glTexCoord2f(dx/textureSize, dy/textureSize);
glVertex3f(2*dx/mapScaler, -2*dy/mapScaler, 0.0f);
glTexCoord2f(0.0f, dy/textureSize);
glVertex3f(0.0f, -2*dy/mapScaler, 0.0f);
glEnd();
glPopMatrix();
}
i++;
}
glPopMatrix();
}
Now we can handle large sites since we’re just rendering 256×256 images.
Frustrum Culling
Implementing frustrum culling was pretty straight forward. I just had problems since I was applying my matrix transformation in the wrong order. Remember, they don’t commute! This is a good article that you can follow to implement it. Once implemented, the performance boost was noticeable for examples using lots of large textures. Now we need to work on a texture manager that will do manual mipmapping to display different images at different camera positions.