With my approach I’m traversing through the DOM to extract the required XHTML elements and their respective values (such as legend, label, input fileds, select, textarea) and regenerating the elements on the same page according to the OCR friendly layout. Currently I’m in the process of identifying and categorizing the elements according to their purpose and importance to come up with a suitable label layout. Herewith I’m attaching two images to show how it will looks like when I apply the feature to a page.
Normal web page
Generated OCR friendly form