Moving around the testing notebooks. Autocropping is about done
with exception to any new versions or converting the stuff to C
code.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Title. Plan is to just convert all the houghline
processing into C since it should be faster than the python opencv.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
As title says, cleaning up and putting the used/important
functions into myfunctions.py file so they could be easily used.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Final version of V2 of houghline preprocessing.
May need to make changes to this version but it's complete
and ready for the OCR and actual ML part now.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
I believe the idea is to use morphology to clean
up the merged threshold version and then get bounding boxes
for the letters, get a mask from those bounding boxes and
and then apply the merged thresh onto a white page using the mask.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Still struggling with getting the text set well while
removing all the other noise and such.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Now to implement cleaning up the resulting image so that
the the outer area and page is white and the text is black.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Houghline crop now returns a cropped version of
the original image (no threshold/black-and-white or
a shrunken picture (smaller size). Just the base image cropped
to what it should be. Need to now add post processing
to make the background white and sharpen the image (make the text
a hard black and the page a hard white).
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Currently breaking it down in to more digestable
functions and then need to work on post processing so
that the main piece is kept and the background is set to
white and the main piece of paper is set to completely white
with text being full black.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Now have deskewing and cropping as one function.
Need to modify it still so it refines the photo even more
and also returns just the rotated rectangle of the original image
instead of the effected (by morphology) image.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Didn't work running it on mac due to minor changes
and an oversight with running the autocropper container.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
First implementation finished. Working on using it to crop.
Going to try using probabalistic hough lines and a bounding box
to pick out lines within a margin of the correct orientation (vertical)
and put a bounding box around these lines to try and approximate the
receipt.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
About to add search selection segmentation so that
I can have multiple processes since some fail to work
correctly while others do.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
saveaspng.ipynb: Saves the dataset as individual .jpg files
so that bad images can be more quickly seen using OS file
previews
dataset_viewer.ipynb: Let's you iterate through the individual
images of a dataset. A less helpful version of what saveaspng
and OS file previews would do
image_viewer.ipynb: Nearly identical to dataset_viewer.ipynb
manualrotationchecker.ipynb: First version of a manual (non-ML)
autorotater/deskewer
testcropper.ipynb: New version of a manual autocropper
manualcropandrotate.ipynb: Combining manual cropping and rotating
Also updated the training loop file and added a blacklist for when
making the dataset from the original dataset. Finally, the
dockerfile was updated to remove installation of some unused libraries
and added a library for the manual autorotator.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>