ewellenr/receipt_indexer

Updated text extractor #16

Merged

ewellenr merged 19 commits from textextractor-test into textextractor

2023-11-13 23:01:54 -05:00

Author	SHA1	Message	Date
Ethan Wellenreiter	c5e2ef3634	Finishing this iteration of the text extractor. Still extracts barcodes and stuff, not just text. That's the possible thing I see left to fix. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-13 22:52:32 -05:00
Ethan Wellenreiter	7cf80d2f8a	Checkpoint for working text extraction Implemented first and sub line clustering. A few touchups to do Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-13 22:44:02 -05:00
Ethan Wellenreiter	a2e6fe715b	Checkpoint in line isolation In between testing out using a line morphology. Can't seem to dial it in for the subclusters. Want to try combining the old and new technique. That is, the line morphology for the first full receipt and then the letter technique for the subclustering. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-10 23:51:03 -05:00
Ethan Wellenreiter	3b60d26c30	Adding a line morphology to grab tiny bits of characters. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-09 22:33:09 -05:00
Ethan Wellenreiter	ec8df21fa6	Working line extractor with an asterisk It does not work if there are small dots near the edge. For example, if a small bit of a character is detached or a colon is really low/high. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-09 11:43:49 -05:00
Ethan Wellenreiter	85935e13f1	Merge branch 'autocropper-test' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into textextractor-test	2023-11-08 18:16:16 -05:00
Ethan Wellenreiter	589133069d	Another update to text clarifier Tried with kernel based off of character size but it didn't work. Also removed old adaptive kernel which was based off of image size. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-08 18:07:02 -05:00
Ethan Wellenreiter	a54ca827cf	Working towards updating line extractor Need to update text clarifier so that lines aren't merged together on characters but have updated it so that there is a deskew in between line clustering. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-07 23:30:57 -05:00
Ethan Wellenreiter	12eff9c27c	Merge branch 'main' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into textextractor-test	2023-11-04 10:27:09 -04:00
Ethan Wellenreiter	2aabfdcfd5	Merge branch 'main' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into autocropper-test	2023-11-04 10:26:11 -04:00
Ethan Wellenreiter	346c4f3cdd	Adding file with helpful links Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-04 10:25:53 -04:00
Ethan Wellenreiter	1a706e19d1	Some changes for implementing a model to extract test Plan to try and train a Donut model. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-11-04 10:24:39 -04:00
Ethan Wellenreiter	c6062a6d93	Beginning model implementation. Tried easyOCR but it was pretty bad so I'm going to try the pytorch based model TrOCR which uses the MIT Licence. `6f60612e7c/LICENSE` Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-10-30 21:26:01 -04:00
Ethan Wellenreiter	e3c5650f0a	Updating and adding test images. Updated refined images and also added images of the extracted lines using the new autocropper and line extraction functions respectively. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-10-30 17:03:33 -04:00
Ethan Wellenreiter	2f50b048ac	Updating the text clarifier to try and connect letters better. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-10-30 16:59:29 -04:00
Ethan Wellenreiter	9d4dd9b08b	Updated/tweaked line extractor. Generally extracts the lines well. There might be some errors in the future so it needs to be checked. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-10-30 16:57:44 -04:00
Ethan Wellenreiter	599b9bc437	Merge branch 'main' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into textextractor-test	2023-10-30 14:53:11 -04:00
Ethan Wellenreiter	d0bf58a21e	Merge branch 'main' of ssh://ssh.git.ewellenr.ca:2222/ewellenr/receipt_indexer into textextractor-test	2023-10-30 00:40:42 -04:00
Ethan Wellenreiter	b40d7379fc	Small changes. Just switching branches. Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>	2023-10-24 18:09:05 -04:00