Adding Autocropper V2 to main #4

Merged
ewellenr merged 31 commits from autocropper into main 2023-10-21 19:30:33 -04:00

31 Commits

Author SHA1 Message Date
0eb2ec34c0 Merge branch 'main' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into autocropper 2023-10-21 19:29:05 -04:00
f91fb1f9b1 Merge pull request 'Merging Py V1 of the completed autocropper/cleanup' (#2) from autocropper-test into autocropper
Reviewed-on: #2
2023-10-18 22:54:19 -04:00
423b511dd9 Cleanup commit
Moving around the testing notebooks. Autocropping is about done
with exception to any new versions or converting the stuff to C
code.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-18 22:48:24 -04:00
e0ce309a0e Merge branch 'autocropper' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into autocropper-test 2023-10-18 22:35:06 -04:00
2888e40ee2 Updated Plan for autocropper
Title. Plan is to just convert all the houghline
processing into C since it should be faster than the python opencv.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-18 21:52:29 -04:00
eb04834d66 Updating gitignore for python function file.
As title says. The python file seems to make
a cache directory so we are ignoring it.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-18 21:39:39 -04:00
b25ffb8602 V2.1 Tuning update
Just tuning the threshold a bit so that the
background whiteout works better.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-18 20:11:18 -04:00
518cea9968 Cleaning up and putting functions into python file
As title says, cleaning up and putting the used/important
functions into myfunctions.py file so they could be easily used.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-18 17:25:17 -04:00
011acea572 V2 houghline preprocessing
Final version of V2 of houghline preprocessing.
May need to make changes to this version but it's complete
and ready for the OCR and actual ML part now.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-18 17:15:14 -04:00
3a98ad0d1c Quick checkpoint 2 with post processing.
I believe the idea is to use morphology to clean
up the merged threshold version and then get bounding boxes
for the letters, get a mask from those bounding boxes and
and then apply the merged thresh onto a white page using the mask.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-13 22:34:23 -04:00
15b36b2de0 Quick checkpoint with postprocessing.
Still struggling with getting the text set well while
removing all the other noise and such.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-11 21:57:03 -04:00
264f8fa80a V1 of final postprocessing for houghline
Has it working but it isn't tuned so it needs to
be tuned/adjusted so that it works well.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-11 16:07:12 -04:00
882892cb53 Complete houghline deskewing and cropping.
Now to implement cleaning up the resulting image so that
the the outer area and page is white and the text is black.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-11 01:29:23 -04:00
834f562604 Cropped original image using functions.
Houghline crop now returns a cropped version of
the original image (no threshold/black-and-white or
a shrunken picture (smaller size). Just the base image cropped
to what it should be. Need to now add post processing
to make the background white and sharpen the image (make the text
a hard black and the page a hard white).

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-11 01:02:16 -04:00
7f796c7a7d Third checkpoint in making houghline crop and deskew
Currently breaking it down in to more digestable
functions and then need to work on post processing so
that the main piece is kept and the background is set to
white and the main piece of paper is set to completely white
with text being full black.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-10 17:05:40 -04:00
9c2d801be3 Second iteration of houghline cropping and deskewing.
Now have deskewing and cropping as one function.
Need to modify it still so it refines the photo even more
and also returns just the rotated rectangle of the original image
instead of the effected (by morphology) image.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-09 23:56:23 -04:00
4c7bb2c54f First implementation of houghline cropping.
Need to fix because deskewing still has some issues.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-09 17:20:18 -04:00
b2f3e89014 First complete implementation of hough line deskewing
Now to work on hough line cropping.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-09 00:28:26 -04:00
a62f628cc1 Merge branch 'main' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into autocropper-test 2023-10-08 13:46:34 -04:00
5bd770da55 First implementation of houghline cropping/deskewing
Still need to fix threshold/morphology so that the bed
picture works.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-08 13:44:43 -04:00
d5e7a2eed2 Deskew using Hough Lines
First implementation finished. Working on using it to crop.
Going to try using probabalistic hough lines and a bounding box
to pick out lines within a margin of the correct orientation (vertical)
and put a bounding box around these lines to try and approximate the
receipt.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-07 12:02:34 -04:00
f66a757d8b Start of Hough Line deskew
Exactly what the title says.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-06 22:57:18 -04:00
4500f79772 Before trying houghline technique
Just a checkpoint

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-06 08:02:16 -04:00
c921f03df2 Point before adding search selection segmentation
About to add search selection segmentation so that
I can have multiple processes since some fail to work
correctly while others do.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-04 18:24:17 -04:00
c3e9f73e73 Pushing current training loop and other helpful files
saveaspng.ipynb: Saves the dataset as individual .jpg files
	so that bad images can be more quickly seen using OS file
	previews
dataset_viewer.ipynb: Let's you iterate through the individual
	images of a dataset. A less helpful version of what saveaspng
	and OS file previews would do
image_viewer.ipynb: Nearly identical to dataset_viewer.ipynb
manualrotationchecker.ipynb: First version of a manual (non-ML)
	autorotater/deskewer
testcropper.ipynb: New version of a manual autocropper
manualcropandrotate.ipynb: Combining manual cropping and rotating
Also updated the training loop file and added a blacklist for when
making the dataset from the original dataset. Finally, the
dockerfile was updated to remove installation of some unused libraries
and added a library for the manual autorotator.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-10-04 02:27:12 -04:00
38b332f5c9 End of V1.1 Attempts
End of trying to train my own v1.1 custom model for autorotation.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-09-29 14:08:11 -04:00
d25dbf576a Merge branch 'main' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into autocropper 2023-09-26 18:02:45 -04:00
c724e3be78 First adjustment to try and get training to work.
Added a batch normalization layer and adjusted
the file to accomodate a restart in the training process.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-09-26 16:26:16 -04:00
71e712e55d State after first attempt at training
Tried training for a while with current state. Didn't show any
progress with training so I'm trying something else. Might also
just need fewer parameters but we'll see.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-09-26 16:12:39 -04:00
96a4abf6aa First complete implementation of ML Model
Doesn't really work well after training but work in progress.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-09-23 13:51:55 -04:00
61cf15e99c Initializing rotator ML algo
Pulling the rotator ML from the app branch and putting it in this one,
the correct one.

Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
2023-09-19 12:59:57 -04:00