8. Advance the top and bottom cutting lines down or up, respectively, provided that the difference is less than a threshold θ3 and the luminosity of the current cutting line does not significantly differ from that of its predecessor. (We are eating up a marginal artifact.)
9. Write out the portion of the image that is between the four cutting lines to a file.
4. Multiresolution segmentation method
Our method of segmenting and classifying regions of a page has been described previously in [15]. By using a multiresolution approach, we gain two benefits: (a) the essential image information for a structural analysis of the page is treated at a level of resolution in which high-frequency noise has been filtered out thereby preventing the introduction of small noise regions, and (b) speed is gained by performing most of the steps of analysis on reduced resolution representation of the image.
The algorithm proceeds in two phases. During the first phase, four feature pyramids are constructed from the document image. During the second phase, regions of the image are classified according to rules into categories: background, text, line-drawing, photograph, or unknown.
4.1. Phase 1: construction of feature pyramids
The first phase constructs four pyramids. First four “feature images” are computed from the original: a “mean” image, a median image, a variance image, and a “threshold” image. Let the width and height of the original image be m and n, respectively. The mean image of size m/16 by n/16 is obtained by computing the average pixel value for each 16×16 block of pixels in the original. The three other feature maps have the same dimensions as the mean image, but they contain the median, variance, and “threshold” values over the 16×16 blocks rather than the mean value. The threshold image is obtained by counting the number of pixels within each 16×16 block that exceed a threshold θ4=250 in the original image.