Merging Py V2 of the completed autocropper/cleanup #2

Merged
ewellenr merged 23 commits from autocropper-test into autocropper 2023-10-18 22:54:23 -04:00
Owner

First completed version (therefore dubbed V2) of the autocleanup function, although only completed in Python.

First completed version (therefore dubbed V2) of the autocleanup function, although only completed in Python.
ewellenr added 23 commits 2023-10-18 22:53:43 -04:00
saveaspng.ipynb: Saves the dataset as individual .jpg files
	so that bad images can be more quickly seen using OS file
	previews
dataset_viewer.ipynb: Let's you iterate through the individual
	images of a dataset. A less helpful version of what saveaspng
	and OS file previews would do
image_viewer.ipynb: Nearly identical to dataset_viewer.ipynb
manualrotationchecker.ipynb: First version of a manual (non-ML)
	autorotater/deskewer
testcropper.ipynb: New version of a manual autocropper
manualcropandrotate.ipynb: Combining manual cropping and rotating
Also updated the training loop file and added a blacklist for when
making the dataset from the original dataset. Finally, the
dockerfile was updated to remove installation of some unused libraries
and added a library for the manual autorotator.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
About to add search selection segmentation so that
I can have multiple processes since some fail to work
correctly while others do.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Just a checkpoint

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Exactly what the title says.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
First implementation finished. Working on using it to crop.
Going to try using probabalistic hough lines and a bounding box
to pick out lines within a margin of the correct orientation (vertical)
and put a bounding box around these lines to try and approximate the
receipt.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Still need to fix threshold/morphology so that the bed
picture works.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Didn't work running it on mac due to minor changes
and an oversight with running the autocropper container.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Now to work on hough line cropping.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Need to fix because deskewing still has some issues.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Now have deskewing and cropping as one function.
Need to modify it still so it refines the photo even more
and also returns just the rotated rectangle of the original image
instead of the effected (by morphology) image.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Currently breaking it down in to more digestable
functions and then need to work on post processing so
that the main piece is kept and the background is set to
white and the main piece of paper is set to completely white
with text being full black.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Houghline crop now returns a cropped version of
the original image (no threshold/black-and-white or
a shrunken picture (smaller size). Just the base image cropped
to what it should be. Need to now add post processing
to make the background white and sharpen the image (make the text
a hard black and the page a hard white).

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Now to implement cleaning up the resulting image so that
the the outer area and page is white and the text is black.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Has it working but it isn't tuned so it needs to
be tuned/adjusted so that it works well.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Still struggling with getting the text set well while
removing all the other noise and such.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
I believe the idea is to use morphology to clean
up the merged threshold version and then get bounding boxes
for the letters, get a mask from those bounding boxes and
and then apply the merged thresh onto a white page using the mask.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Final version of V2 of houghline preprocessing.
May need to make changes to this version but it's complete
and ready for the OCR and actual ML part now.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
As title says, cleaning up and putting the used/important
functions into myfunctions.py file so they could be easily used.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Just tuning the threshold a bit so that the
background whiteout works better.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Title. Plan is to just convert all the houghline
processing into C since it should be faster than the python opencv.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Moving around the testing notebooks. Autocropping is about done
with exception to any new versions or converting the stuff to C
code.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
ewellenr merged commit f91fb1f9b1 into autocropper 2023-10-18 22:54:23 -04:00
ewellenr changed title from Merging Py V1 of the completed autocropper/cleanup to Merging Py V2 of the completed autocropper/cleanup 2023-10-18 22:54:56 -04:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: ewellenr/receipt_indexer#2
No description provided.