5 Document Verification Framework
This topic describes about the document verification framework where after deployment of Common Core applications in the webLogic server.
Prerequisites
Python, Tesseract and other required libraries must be installed for running document verification APIs.
- Operating System: Version of same should be same as other banking products.
- Other Oracle Linux versions are not supported due to incompatibility with gcc++ v17 compiler.
- gcc++ compiler v17
- This compiler is required for tesseract.
- Install the following OS packages in the following order:
- yum install libstdc++
- yum install autoconf automake
- yum install libtool
- yum install pkg-config
- yum install gcc gcc-c++ make
- yum install libjpeg-devel libpng-devel libtiff-devel zlib-devel readline readline-devel
- yum install zlib zlib-devel
- yum install libffi-devel openssl-devel
- yum install bzip2 bzip2-devel
- yum install poppler-utils
- yum install xz xz-devel xz-libs
- yum install mesa-libGL
- yum install mesa-libgbm
- yum install mesa-libglapi
- yum install sqlite-devel
- yum install openblas
- yum install python3.12-devel
- yum install python3.12-3.12.6-1.el8_10
- yum install python3.12-pip python3.12-setuptools python3.12-wheel
Note:
Points 17, 18, and 19 outline the installation of python 3.12.6. Please don’t upgrade the libraries unless instructed in the documentation.To prevent Python from being upgraded, use the following commands after running the above points:- sudo yum install yum-plugin-versionlock
- sudo yum versionlock add python3.12
Tesseract is an optical character recognition (OCR) engine for various operating systems. The latest version must be installed on the machine to extract the text from the documents.
Refer to Tesseract Installation section in Oracle Banking Microservices Platform Foundation Installation Guide to manually install the latest version of the tesseract.
Document Verification Application Installation
The app will be shipped as a byte-coded whl file. This wheel file will install all the implementation files without the dependencies. All the required dependencies are bundled together in a python.zip file which are to be extracted and installed separately (refer to Step 4 below). It's recommended to install the whl file and the dependencies in a virtual environment using "pip" so that it doesn't affect any other operations or applications running in the system.
Applications using microservices based architecture and using the same for security, needs to create a Config.ini file, same is required for the Eureka server configuration. Please create a config file with the name Config.ini and paste the text below:
[DEFAULT]
eureka_server=http://<Host Name>:<Port Number>/plato-discovery-service/eureka
You can edit the eureka server address if needed. The app name should not be changed. This is important for Role-Based Access Control. Please note that registering the app on Eureka is optional and you can skip this if not needed. But in any case, Config.ini is required. In eureka_server variable you can simply give localhost url.
The folder structure to be followed is:
├── root_dir
├── python
├── Config.ini
Note:
Please make sure that the user are using linux operating system and the installed Python version is 3.12.6 and that of pip is above 20.0.0. Run the following command to upgrade pip to the latest version.pip install --upgrade pip
- Use the below command to install the application wheel package
provided, e.g.
ofss_ml_document_verification_server_without_req-{version}-py3-none-any.whl
pip install <wheel_package_name>.whl
Note:
Refer to OSDC file for the exact version number. - Now all the dependencies need to be installed. In order to do
this, extract the python.zip file provided, go into the python folder (cd
python/ ) and run the following
commands:
pip install configparser --no-index --find-links. pip install connexion --no-index --find-links. pip install datefinder --no-index --find-links. pip install dateparser --no-index --find-links. pip install Flask --no-index --find-links. pip install importlib-metadata --no-index --find-links. pip install opencv-python --no-index --find-links. pip install pdf2image --no-index --find-links. pip install Pillow --no-index --find-links. pip install pyap --no-index --find-links. pip install pybase64 --no-index --find-links. pip install pytesseract --no-index --find-links. pip install python-dateutil --no-index --find-links. pip install six --no-index --find-links. pip install pyxDamerauLevenshtein --no-index--find-links. pip install python-magic --no-index --find-links. (optional) pip install py-eureka-client --find-links.
Note:
This application works when above libraries are installed with required versions. Please don’t upgrade the libraries unless instructed in the documentation. - After installing the wheel package and the dependencies, we can
run the document verification server using the below-mentioned
command,
python -m ofss_ml_document_verification_server
- Please note: This will by default run the app on port 8090 and
not register the app to Eureka. To do that please use the below-mentioned
command:
python -m ofss_ml_document_verification_server -p 5000 -r true
The above commands make the app run on port 5000 and registers to the Eureka server as well. These arguments may or may not be used together and the port number can be any. By default, the system has been configured to -r false.
Please note that once the service is registered on Eureka, it will need role-based access to send and receive requests.
For example: if the app is registered onhttp://<Host Name>:<Port Number>/plato-discovery-service
, then we need a bearer token fromhttp://<Host Name>:<Port Number>/api-gateway/platojwtauth
and then callhttp://<Host Name>:<Port Number>/api-gateway/ofss_ml_document_verification_server/extractInformation
with the following headers:
1. Authorization – bearer <token>
2. appid- (ex- CMNCORE)
3. branchCode
4. content-Type – application/json
5. userId
Please note that the userId and branchCode will be based on the flyway script entries.SMS Scripts:
Insert into SMS_TM_SERVICE_ACTIVITY SERVICE_ACTIVITY_CODE,DESCRIPTION,CLASS_NAME,METHOD_NAME,APPLICATION_ID,SERVICE_TYPE,UI_ACTIVITY_CODE) values ('CMC_SA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION','Extracts meaningful details from an image','OFSS_ML_DOCUMENT_VERIFICATION_SERVER','extractInformation','CMC','Service API',null); commit;
Insert into SMS_TM_FUNCTIONAL_ACTIVITY (FUNCTIONAL_ACTIVITY_CODE, APPLICATION_ID, TYPE) values ('CMC_FA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION','CMC','O'); commit;
Insert into SMS_TM_FUNC_ACTIVITY_DETAIL (ID,FUNCTIONAL_ACTIVITY_CODE,SERVICE_ACTIVITY_CODE) values ('CMC_FD_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION', 'CMC_FA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION', 'CMC_SA_OFSS_ML_DOCUMENT_VERIFICATION_EXTRACT_INFORMATION'); commit;
Plato Scripts:
This procedure makes sure that only authenticated users can use the API. However, the developers running the app have an option to disable registry on Eureka and test the API normally.Insert into PROPERTIES (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.path','/ofss_ml_document_verification_server/**'); Insert into PROPERTIES (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.serviceId','ofss_ml_document_verification_server'); Insert into PROPERTIES (ID,APPLICATION,PROFILE,LABEL,KEY,VALUE) values (PROPERTIES_ID_SEQ.NEXTVAL,'plato-api-gateway','jdbc','jdbc','zuul.routes.ofssmldoc.stripPrefix','false'); commit;
- To run the document verification server in the background, use
the command
below.
nohup python -m ofss_ml_document_verification_server & tail -f nohup.out
Note:
After the execution of the above command, all the execution logs will be added to nohup.out (text file). Now the user may close the terminal and the app will still be running on port. - To terminate/kill the app, we can use the netstat command to
find the process_id using the port on which the app is running and then use
the kill command with the process_id of the app as shown below to terminate
the
application.
netstat -nlp | grep 8090
kill -9 <process_id>