Top five things you need to know about Archiving with Camunda

Top five things you need to know about Archiving with Camunda

The ability to archive a Camunda application allows for the definition of multiple deployments (process archives). This ability helps make sure that all needed deployments are performed upon deployment of your application.


In this Podcast episode, Stuart and Max explore the 5 things that you need to know about Archiving with Camunda when using the open-source Camunda Platform, an Enterprise Platform for Workflow and Decision Automation.


What is the context here?

The context here is that you’re ready to deploy your Camunda application to production, but it’s your first time, and you’re wondering how to archive historical data. So that’s what we’re going to talk about today: the care of feeding of Camunda’s archival data.


Why do you need to do archiving at all with Camunda?

Camunda tracks all sorts of information about your processes, tasks, workflows, integrations, decisions, start times, end times, execution, delays, etc. That data is critical to your business, and it cannot be lost under any circumstances. So, while design considerations must take Camunda runtime performance into account, they must also ensure that data loss doesn't occur.

If you don’t archive that information and do so carefully, you’re likely to lose it: worse, you could cause a production failure if those tables grow unchecked. That’s because the system was designed to be archived, and will 100% break if you don’t do it.


What does this mean at a tactical, hands-on-keyboard level? How do you get started?

You would typically write a Custom History Event Handler. A Camunda History Event Handler can read, echo, and even scrub HistoryEvent (history events) instances in an easily customizable way. For example, if you can have a custom history event handler that will *asynchronously* transmit history events to some kind of Queue, like Kafka, from which they can then be disseminated via subscriptions to any system that needs that data. Those subscribers can then slice, dice, and report on the data to your heart’s content.


Does this have to be done asynchronously?

The history Event Listener is invoked every time you form one node to another in a process. If it’s also taking the time to update a DB synchronously on each of those steps, performance is really going to suffer. Customers can destroy their step-transaction times, which were previously in the milliseconds, into 3 minute long invocations, or longer, at each step, because they were making a series of restful calls synchronously. That means that the step won’t be complete until you hear back from the invocation. The issue is that invocations of a history event handler's handleEvent method are blocking, meaning that the Camunda engine will not complete its transactions - which save both runtime and history data together - until the handleEvent methods in all configured history event handlers have been completed.


How is that actually communicated? Using RESTful APIs?

Because even an asynchronous RESTful invocation will use Threads that would otherwise be performing for your process engine while if you’re doing that on every single step, you could be opening a “Pandora’s box”. So, yes, but if REST API's are used to communicate history event information, you have to engage in pretty thorough performance testing to ensure that the RESTAPI calls won't significantly impact runtime performance (if the REST API calls are made in the main threads) or that the REST API calls won't perform slowly enough - even if made within separate threads - to result in resource starvation within the runtime Camunda JVM's.


Can this risk be avoided?

Separate the archiving and deletion into two separate ideas. To minimize the risk of data loss, we usually provide a supplemental archiving and deletion process that will complement the use of the custom history event handler. This will archive the Camunda history data at regular intervals, copying it to another Camunda instance with an identical schema, and delete that data (from the source schema) once the archival has been completed and confirmed to have been successful.

It's really important that no deletions ever occur until a successful archival is confirmed.

The use of the identical schema lets us use another Camunda runtime instance to reference that archival database and facilitate inbound REST API calls for data retrieval. It’s basically a failsafe that can really save your butt when you need it.


What are the top five things you need to know about Archiving with Camunda?

This podcast discusses 5 things you need to be aware as “Best Practices” when Archiving with Camunda. Stuart and Max discuss the top ways to archive and safely clean up Camunda History data including the need for archival capabilities, the care of feeding of Camunda’s archival data, the Custom History Event Handler, the need for asynchronously and how to avoid associated risks.