Extract Text from a PDF File

The following code sample extracts text content from a PDF file using documentCapture.documentToText(options). The file is located in the NetSuite File Cabinet and specified using its internal ID, but you can also specify a file using its name and path.

After the content is extracted, it's passed to the llm.generateText(options) method of the N/llm module. This method lets you use a large language model (LLM) to analyze the content further by providing a suitable prompt. The data returned from documentCapture.documentToText(options) can be provided to llm.generateText(options) as a document, which helps you integrate these modules for more advanced use cases. For more information about the N/llm module, see N/llm Module.

In this example, the LLM is asked about the purpose of the provided invoice. The response includes citations, which indicate the content in the document that was used to generate the response. Finally, the LLM response and the citations are logged.

For instructions about how to run a SuiteScript 2.1 code snippet in the debugger, see On-Demand Debugging of SuiteScript 2.1 Scripts.

Note:

This sample script uses the require function so that you can copy it into the SuiteScript Debugger and test it. You must use the define function in an entry point script (the script you attach to a script record and deploy). For more information, see SuiteScript 2.x Script Basics and SuiteScript 2.x Script Types.

          require(['N/file', 'N/documentCapture', 'N/llm'],
    function(file, documentCapture, llm) {
        // "14" is the unique ID of a PDF stored in the NetSuite File Cabinet
        const fileObj = file.load({
            id: "14"
        });
        const extractedData = documentCapture.documentToText({
            file: fileObj
        });

        const response = llm.generateText({
            prompt: "What is this invoice for?",
            documents: [{
                id: '14',
                data: extractedData
            }]
        });
      
        log.debug("Answer: ", response.text);
        log.debug("Citations: ", response.citations);
    }
); 

        

General Notices