OpenAI Summary (ChatGPT) Nuxeo Integration

Generates a text summary from a file using OpenAI ChatGPT

Technical Documentation


This integration uses a Producer/Consumer pattern for asynchronous processing of the input file.


The input for this integration is a file uploaded in Nuxeo. If the file is not in PDF format, it is converted to a PDF using the registered converters, a process called Blob Conversion.


The architecture consists of the following core concepts:

  1. Blob Conversion: This operation converts non-PDF files to the PDF format using registered converters.
  2. Nuxeo Stream (Bulk Actions): A Nuxeo Bulk Action processes text chunks and assembles the summary asynchronously. It involves:
    • SummaryServiceImpl.summaryProducer(): This function converts the document to PDF (if not already), splits text into pages, and produces records.
    • PageSummaryComputation: This component calls the OpenAI summary endpoint for each text chunk and saves the result to a Key-Value Store (KVS).
    • SummaryDoneComputation: This function saves the merged summary from the KVS of all pages in the correct order once all consumers are done.
  3. Event Listeners:
    • SummaryListener: This listener gets notified as soon as a document is created or modified. If the main blob is dirty, it fires an “extractSummaryEvent” event.
    • BulkSummarizeListener: This listener catches the “extractSummaryEvent”, converts the blob to a PDF, and splits the document into pages (producer role). It is configured to run in a dedicated queue named “bulkSummarizeListener”.


The output is a Nuxeo document with the merged summary of all PDF pages. A facet, “SummaryFacet”, is dynamically added to all documents that can be summarized. The summary is saved on the document in a new property: “summary:summary”.

Additionally, the following properties are set:

  • “summary:lastComputed”: The last date the summary was computed.
  • “summary:status”: Indicates the status of the summary generation – DONE if successful, ERROR if not.


Follow the steps below to configure the integration:

  1. Add the OpenAI API token in nuxeo.conf as openai.token:<api-token>.
  2. The default configurations include:
  1. Enable or disable generating the summary automatically at document creation by setting to true or false.

The system supports the following mime-types: text/plain, application/pdf, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document.

Concurrency and retry policies are configured in summary-stream-contrib.xml. By default, maxRetries is set to 10, with Nuxeo attempting a retry every 10 seconds. However, this value can be modified as required.