OpenAI Generative Image (Dall-E) Nuxeo Integration

Generates an image based on text prompt from the user using OpenAI Dall-E

Configuration

Add to nuxeo.conf the following configurations and replace ${ORGANIZATION}/${API_KEY}:

generative.ai.openai.organization=${ORGANIZATION}

generative.ai.openai.apikey=${API_KEY}



OpenAI Summary (ChatGPT) Nuxeo Integration

Generates a text summary from a file using OpenAI ChatGPT

Tehnical documentation

Architecture

This integration uses a Producer/Consumer pattern for asynchronous processing of the input file.

Input

The input for this integration is a file uploaded in Nuxeo. If the file is not in PDF format, it is converted to a PDF using the registered converters, a process called Blob Conversion.

Processes

The architecture consists of the following core concepts:

  1. Blob Conversion: This operation converts non-PDF files to the PDF format using registered converters.
  2. Nuxeo Stream (Bulk Actions): A Nuxeo Bulk Action processes text chunks and assembles the summary asynchronously. It involves:
    • SummaryServiceImpl.summaryProducer(): This function converts the document to PDF (if not already), splits text into pages, and produces records.
    • PageSummaryComputation: This component calls the OpenAI summary endpoint for each text chunk and saves the result to a Key-Value Store (KVS).
    • SummaryDoneComputation: This function saves the merged summary from the KVS of all pages in the correct order once all consumers are done.
  3. Event Listeners:
    • SummaryListener: This listener gets notified as soon as a document is created or modified. If the main blob is dirty, it fires an “extractSummaryEvent” event.
    • BulkSummarizeListener: This listener catches the “extractSummaryEvent”, converts the blob to a PDF, and splits the document into pages (producer role). It is configured to run in a dedicated queue named “bulkSummarizeListener”.

Output

The output is a Nuxeo document with the merged summary of all PDF pages. A facet, “SummaryFacet”, is dynamically added to all documents that can be summarized. The summary is saved on the document in a new property: “summary:summary”.

Additionally, the following properties are set:

  • “summary:lastComputed”: The last date the summary was computed.
  • “summary:status”: Indicates the status of the summary generation – DONE if successful, ERROR if not.

Configuration

Follow the steps below to configure the integration:

  1. Add the OpenAI API token in nuxeo.conf as openai.token:<api-token>.
  2. The default configurations include:
summary.extraction.openai.url=https://api.openai.com/v1/completions
summary.extraction.openai.model=text-davinci-003
summary.extraction.openai.temperature=0.3
summary.extraction.openai.max-tokens=200
summary.extraction.openai.top-p=1
summary.extraction.openai.presence-penalty=0
maretha.summary.maxRetries=10
feature.summary.auto.generation.enabled=true
summary.extraction.enable.mime-types=text/plain,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document
  1. Enable or disable generating the summary automatically at document creation by setting feature.summary.auto.generation.enabled to true or false.

The system supports the following mime-types: text/plain, application/pdf, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document.

Concurrency and retry policies are configured in summary-stream-contrib.xml. By default, maxRetries is set to 10, with Nuxeo attempting a retry every 10 seconds. However, this value can be modified as required.