While most VGA games use mode 13 with its simple linear arrangement of data, many games use VGA mode X. This splits pixel data up into four planes, similar to the way the EGA does. Unlike the EGA, each pixel is contained entirely with a plane, but adjacent pixels are read from different planes.
Using a 320×200 image as an example, think of the file as containing four 80×200 images, one after the other. The 200 rows in each image correspond to the 200 rows in the final image, however the X-coordinates are handled differently. Each row in the first image should take up pixels at X-coordinates 0, 4, 8, etc. Each row in the second image should be placed at X-coordinates 1, 5, 9, etc.
In other words, the byte at position 16081 is at offset 81 into the second image. Since there are 80 bytes per row, this is in the second row, so the Y-coordinate of this pixel will be 1 (counting from 0.) 81 is one byte into this row, so the X-coordinate will be 4 (think 0, 4, 8, etc. from above.) However this is the second 80×200 image, so we have to add 1 to this, giving a final X-coordinate of 5. Therefore the byte at position 16081 corresponds to the pixel at coordinates (5,1).