3.5 Billion Documents

Navigating the Evolution: Next-Level Insights from Billion-Document Migrations

At Maretha Solutions, our team has the most real world experience with the largest, longest running, and most complex implementations of the Nuxeo Platform. We sometimes get called the “migration experts”. We sometimes get called the “DAM experts”. The truth is we all love helping clients achieve their goals with Nuxeo. This blog post will detail the pattern we developed and follow for the majority of migrations we take on. Lessons from this pattern can be applied to any migration of data between distinct systems.

The screenshot above is a production Nuxeo Platform implementation that serves over 10,000 daily users on a continuous basis.

The Nuxeo Platform is capable of so much more than storing 3.5 Billion document records. The scale and complexity of migrating many distinct systems of record into one is a challenge. In my previous blog post about FileNet to Nuxeo migrations (which applies when migrating any system into Nuxeo) I described the general process the team followed and the results we achieved. Now, one of these clients has a single repository with more than 3.5 Billion Documents stored. 

The migration toolset we implemented on top of Nuxeo’s Bulk Action Framework is capable of ingesting 500 million documents over a weekend. The speed of ingestion with this toolset provides advantages to any migration. This is not an extension to an existing SDK making API requests wrapped into a product. This migration toolset transforms the Nuxeo Platform. Exposing a set of automation endpoints to extract data from cloud data sources, mapping and indexing the content into Nuxeo (and Mongo DB and ElasticSearch) at a rapid and configurable pace.

The architecture of the Nuxeo Platform makes highly scalable automations like this possible. Here is how this migration toolset leverages the platform architecture to achieve these results.

When doing a migration into Nuxeo, there are two main components of document records that have to be migrated. One is the binary files. These come from the source system and are usually saved in local storage (at a datacenter) or in cloud hosted storage (S3 or other object store). Nuxeo uses an S3 object store by default, so this is what we used to bring in the binaries. AWS has a range of tools to upload files into S3 buckets, including hardware devices, and REST API’s. We setup a client access S3 bucket in the same availability zone as the Nuxeo cluster. This bucket is configured with PutObject permissions so clients can add files to the bucket in whatever way they choose.

As I mentioned in my previous blog post binary files should be moved first. This is critical as the movement of binary files can be fast or slow depending on the size of the files and network bandwidth of the originating system(s). Nuxeo secures each binary file by removing the extension from the filename and running the binary through a hashing algorithm. 

We use a lambda inside AWS to process the inbound files from the originating system — removing the extension, hashing the filename, moving the file to the object store connected to the Nuxeo server, and producing a report file that contains the original filename, as well as the corresponding hash key.

In order to arrange the metadata for each document record in an effective and scalable way, we chose CSV files as they are smaller and more secure than other options. JSON files are also a viable option as this format includes native data-type validation, however the file size is larger. The ingestion file type is a configuration that can be made in the code of the platform add-on. 

With either file type the best way to transmit these files to the Nuxeo cluster is by streaming them from a cloud connected S3 object store. In our testing streaming from S3 is faster than reading the file from a locally attached EFS. This is because the file does not need to be fully loaded in memory and it can be processed in chunks.

Stream Import Architecture

Here are the simple steps that the team follows to run an import of any size into the platform:

  1. Users use AWS CLI (or any supported method) to upload CSV (or JSON) files created from the exporting system into the client access S3 bucket.
  2. Lambda’s evaluate the content that is added to the client access S3 bucket and automatically move the metadata and hashed binary files into the S3 bucket connected to the Nuxeo cluster.
  3. An automation operation is triggered by API that references the metadata file that will drive the import.
  4. Nuxeo begins streaming the metadata file from S3 and breaking the content into Kafka messages for processing.
  5. Messages are processed by streams — using the native Nuxeo Bulk Action Framework — and documents are created with reference to the already existing binary files.
  6. In the background the content is indexed into ElasticSearch (or OpenSearch) after the data is created in Mongo.

This pattern is now being used by several different clients of Nuxeo and Nuxeo Cloud. Not only does it come with a speed advantage over other import options, it is also versatile, capable, and adjustable to any clients needs.

In the realm of digital content management, the journey we’ve embarked on with Nuxeo stands as a testament to innovation, scalability, and the transformative power of technology. Migrating over 3.5 billion documents into a single, unified repository is not just a technical achievement; it’s a strategic leap forward in how we access, manage, and leverage critical content. With more than 10,000 daily users now relying on this robust system, we’ve moved beyond mere storage to creating a dynamic, efficient ecosystem for content management. The success of this project, underscored by the high-speed, flexible, and efficient toolset we developed, illustrates not just the capabilities of the Nuxeo Platform but also the vast potential it holds for organizations seeking to revolutionize their content management strategies. As we continue to refine and deploy this pattern across various clients, it’s clear that the intersection of cloud technology, data migration, and content management is an evolving landscape ripe with opportunities for innovation and growth. This journey with Nuxeo is more than a milestone — it’s a blueprint for the future of enterprise content management, where speed, scalability, and precision converge to create unparalleled business value.

Contact us for more information about our existing migration tools.

Unleashing the Power of Generative Image AI in Nuxeo

Learn about our published add-on here: https://maretha.io/generative/
Try it out on the Nuxeo Marketplace here: https://connect.nuxeo.com/nuxeo/site/marketplace/package/nuxeo-generative-ai-package?version=1.0.0

In the digital age, businesses accumulate vast amounts of data in various formats, such as images, videos, 3D assets, and documents. Managing and utilizing these assets effectively is key to every business’s success. One tool that organizations employ to handle this challenge is Digital Asset Management (DAM) systems. Nuxeo, a well-known DAM solution provider, stands out as a significant player in this field, known for its ability to manage, find, and share digital assets. 

However, there is a new technology trend that can elevate DAM solutions like Nuxeo to new heights — Generative Image Artificial Intelligence (AI). This technology can synthesize realistic images that did not exist before, based on training from an enormous database of real images. This transformative technology opens up a world of possibilities in the DAM ecosystem. 

Maretha Solutions has recently released a Nuxeo add-on to leverage the power of generative image AI within the context of the Nuxeo DAM. This add-on generates images based on user parameters quickly and seamlessly to save time in creative areas of the organization. Here is a short video demonstration of how it works: 

Let’s delve into the benefits of integrating Generative Image AI into Nuxeo’s DAM.

1. Boosting Content Generation and Creativity

In an age where visuals are king, companies are under constant pressure to generate visually engaging content. It is a daunting task to keep up with the ever-growing demand. By harnessing the power of Generative Image AI, organizations can automate and expedite this process. Generative AI can create new, unique images based on specific guidelines, reducing the manual workload, and enabling faster content generation.

In addition, Generative Image AI can fuel creativity by providing designers with unique image ideas. By inputting certain parameters or themes, the AI can generate image suggestions, providing a springboard for creativity and inspiration. Therefore, the integration of this technology can significantly enhance the creative process.

2. Enhancing Personalization 

Today, personalization is a crucial part of customer engagement strategy. Businesses are always on the lookout for innovative ways to tailor their offerings to individual customer preferences. With Generative Image AI integrated into Nuxeo’s DAM, businesses can craft personalized images or creatives for their audience. This kind of personalized visual content can resonate more powerfully with the audience, thereby enhancing engagement and conversion rates. 

3. Cost and Time Efficiency

One of the main advantages of implementing AI in any field is the reduction in time and cost. Generative Image AI is no exception. By integrating it into Nuxeo’s DAM, businesses can save on resources allocated to content creation. It reduces reliance on external agencies and also cuts down the time taken from concept to execution of design ideas. Consequently, it allows the resources saved to be deployed in other critical areas.

4. Future-proofing Your Business

As AI continues to evolve, businesses must embrace this technology to stay competitive. By integrating Generative Image AI into DAM systems, organizations can keep up with the latest technological trends and future-proof their operations. It ensures businesses are prepared for future advancements and can adapt quickly to changing market dynamics. 

In conclusion, the integration of Generative Image AI into Nuxeo’s DAM system can provide numerous benefits. It can boost content generation and creativity, enhance personalization, streamline workflows, save time and cost, improve search capabilities, augment user experience, and future-proof businesses. 

Although the implementation of Generative Image AI requires an initial investment in technology and training, the long-term benefits make it a worthwhile investment. By taking advantage of this innovative technology, businesses can leverage their digital assets more effectively and stay ahead of the competition in this digital age. 

Generative Image AI is not just a futuristic concept; it’s a transformative tool that’s reshaping the digital landscape today. It’s time to rethink how we manage and use digital assets. And integrating Generative Image AI with a proven DAM system like Nuxeo is a significant step in that direction.

Unleashing AI: How ChatGPT Revolutionizes Document Summarization within the Nuxeo Platform

Unleashing AI: How ChatGPT Revolutionizes Document Summarization within the Nuxeo Platform

In a world driven by the proliferation of data, the ability to effectively manage and interpret information is paramount. This is particularly true in the realm of contract management, insurance claim administration, policy evaluation, legal document review, and other documentation-rich sectors. With expansive volumes of text to review and comprehend, professionals in these fields often find themselves overwhelmed and slowed down. 

However, thanks to innovative breakthroughs from OpenAI’s ChatGPT, we are witnessing a transformative shift in how document summarization is approached, and this paradigm shift is poised to save Nuxeo users hours of precious time. 

Maretha Solutions has recently released a Nuxeo add-on to leverage the power of ChatGPT within the context of the Nuxeo WebUI. This add-on summarizes text documents like pdf, docx, and .rtf quickly and seamlessly to save the time of business users that work in document review areas of the organization. Here is a short video demonstration of how it works:

Maretha Solutions Document Summary Nuxeo Platform Add-On

Unveiling the Power of ChatGPT

To fully appreciate this innovation’s potential, let’s first understand what ChatGPT is. This cutting-edge artificial intelligence (AI) model boasts of capabilities such as answering queries, writing essays, translating languages, and even generating creative content. Notably, it excels in processing and summarizing large volumes of text, making it a game-changing tool for professionals who routinely work with substantial documentation.

But how does it do this? Trained on a diverse range of internet text, ChatGPT leverages a deep learning model known as the Transformer, which allows it to generate coherent, relevant summaries based on the context of a given text. Essentially, the model identifies key points in a document, condenses them, and presents them in a manner that maintains the original meaning but in a significantly reduced form. This capability is now being utilized to provide substantial time-saving benefits to users of the Nuxeo Platform.

The Nuxeo Platform: An Arena for Digital Transformation

Nuxeo, a leading Cloud Native Content Services Platform, empowers its users to manage, locate, and utilize content with unparalleled ease and efficiency. The platform is highly favored by organizations dealing with a plethora of digital documents, including but not limited to contracts, claims, and policy documents.

Traditionally, reviewing such extensive documents required manual effort, and consequently, significant time investment. As organizations increasingly digitalize, this method becomes impractical and inefficient. With the integration of AI technologies, Nuxeo has the potential to completely revolutionize this space.

ChatGPT and Nuxeo: A Time-Saving Synergy

Incorporating ChatGPT’s document summarization capabilities into the Nuxeo platform can save users hours of time traditionally spent reviewing contracts, policies, and other text-heavy documents. By effectively summarizing these documents, ChatGPT allows users to grasp key information without reading through each line of text. This not only expedites the review process but also ensures that no critical point is missed, something manual reviews can’t always guarantee.

For instance, in the case of contract management, users often need to go through several iterations of a document, each usually packed with legal jargon and complex clauses. By using ChatGPT, Nuxeo users can receive a concise summary highlighting key elements such as the contract parties, obligations, rights, and any specific terms or conditions. This significantly reduces the time spent on reviewing contracts, thereby accelerating deal cycles and improving operational efficiency.

Similarly, for policy review, users can quickly understand the critical aspects of a policy without wading through the entire document. Whether it’s internal policies for HR or compliance or external policies for regulatory purposes, a comprehensive summary generated by ChatGPT can provide the necessary insights more quickly and efficiently than traditional manual methods.

Looking Ahead: AI and the Future of Document Management

The integration of artificial intelligence’s revolutionary natural language processing within Nuxeo represents a significant step forward in the realm of document management. However, it is just the beginning. As AI continues to advance, we can expect more business use cases that will save users countless hours of time when using the Nuxeo platform as a content repository for their business documents.

Speed up your Nuxeo 2021 Upgrade: hands on advice and useful resources

Why upgrade to LTS2021? While there are numerous benefits to upgrading to the latest version (performance improvements, bug fixing, new features, etc), according to https://www.nuxeo.com/legal/supported-versions/ the support for LTS 2019 (or 10.10) is ending on 04/04/2023. And this date is right around the corner!

2023 tech resolutions – Top 10 costly mistakes to avoid – even for Nuxeo experienced developers

It’s January 2023 so what better time to start setting some good tech resolutions? As always, few of my yearly tech resolutions are to write better code, optimize more, save $ on the infrastructure cost and of course, learn as many new things as possible (Integrate with chatgpt — anyone?).

On the topic of writing better code — maybe now it’s an ideal time to refactor/ avoid making some mistakes that could potentially lead to a lot of $ spent on infrastructure and poor performance of your application overall?

A simple, dependency free Event Bus for Nuxeo Web UI / Polymer / any JavaScript project

A simple, dependency free Event Bus for Nuxeo Web UI / Polymer / any JavaScript project

Let’s start with a discussion on component coupling, communication, and when to an event bus. Polymer has a few communication options available, and the key is finding the optimal coupling for each situation:

How to add custom logic to existing Nuxeo Web UI buttons

Let’s say you want to add some extra logic for the click event on a button. The first problem is that there are already some functions attached to the click event of the button.
Adding a new event, does not guarantee it will run first.

Custom rendering per mime-type in Nuxeo

In this article I am going to focus on a rendering issue that one of our clients at Maretha.io encountered and I will detail the steps performed to find a solution.

Converting Nuxeo LTS2021 Web UI JavaScript components to HTML components for import in a nuxeo-project-bundle.html

Unbundled (original) Nuxeo LTS2021 Web UI JavaScript components are defined in JS files ready to be bundled with Webpack into the Nuxeo Web UI JS bundles. They have “import” statements which bring in the necessary components & behaviors.

How to overwrite Nuxeo Web UI bundled JS components in Nuxeo LTS2021

This is a solution for successfully overwriting Nuxeo LTS2021 Web UI bundled JS components without getting the error Uncaught DOMException: Failed to execute ‘define’ on ‘CustomElementRegistry’: this name has already been used with this registry.