OpenAI Summary (ChatGPT) Nuxeo Integration
Generates a text summary from a file using OpenAI ChatGPT
Architecture
Input: Nuxeo documents with text files Process:
a. Blob Extraction: Get the Blob from the file:content
property of the document.
b. Blob Conversion: Supported text files are converted to PDF using any2pdf
converter.
c. Text Processing: Process each page one by one.
d. Nuxeo Stream: Use Nuxeo Stream to process the text chunks asynchronously:
- Producer/Consumer Pattern
- SummaryServiceImpl.summaryProducer() -> Split text pages and send to Stream Log.
- PageSummaryComputation -> For each text chunk, call the OpenAI ChatGPT summary endpoint and save the result to KVS
- SummaryDoneComputation -> Save the merged summary from KVS of all pages in the correct order.
e. Event Listeners:
- PostCommitEventListener -> process multiple documents
- EventListener -> blobIsDirty check
Output: A Nuxeo document with the merged summary of all pages.
The architectural design provides a high-level overview of the solution, illustrating the key components and their interactions. It ensures that the solution processes pages one at a time, minimizing memory usage, and leverages Nuxeo’s Stream Service for asynchronous processing.
Configuration
- Add openai api token in nuxeo.conf
openai.token:
- Default configurations:
summary.extraction.enable.mime-types=text/plain,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document
summary.extraction.openai.url=https://api.openai.com/v1/completions
summary.extraction.openai.model=text-davinci-003
summary.extraction.openai.temperature=0.3
summary.extraction.openai.max-tokens=200
summary.extraction.openai.top-p=1
summary.extraction.openai.frequency-penalty=0
summary.extraction.openai.presence-penalty=0
summary.extraction.http.max-retries=3
summary.extraction.http.sec-delay=10
summary.extraction.http.sec-max-delay=6