Massive Gain in Efficiency

By Alexander Lueders, Douglas Merrill, Vegard Stenstad and Henrik Steudel

Product development team members at entimo combined process optimizations to generate a 27-fold increase in efficiency. Find out how.

In a mountainous daytime landscape, a small wooden bridge spans the rocky bed of a swiftly flowing stream. It can carry only a small amount of foot traffic.

A key feature in entimICE is traceability which, among other things, keeps track of which files have been created or updated by a program run. (A program in this context is an end-user business program such as SAS/R, or an entimICE script. The program may have many different versions, with each version reflecting a change made to its source or its parameters.) When a user runs a program, entimICE creates a program execution for the current program version. This execution contains information about which objects have been read or written or deleted. Usually the objects that the program uses are files, but entimICE also keeps track of macro calls and database view usages.

The challenge

One of the entimICE software development teams at entimo had a task to create a tool that would migrate data from an old relational data model to a new document-based data model for a large (1 TB or more) volume of data. Further, the tool needed to be able to validate that all of the data had been transferred from the old system to the new one.

“A program execution trace database is very complex,” says Vegard Stenstad, who led the effort. “There are nested executions, as well as used objects and conflicting executions.” Further, the tool had to be able to stop and resume during the migration. It needed to work as quickly as possible.

Understanding the tool requires looking at the use case and the existing database structure. When an entimICE user runs a program in an area where tracing is enabled, the application records everything that the program does. That includes recording information about the program itself, any files used for input or output, log files, database connections, and more. This data is then stored in a relational database (Oracle).

In a production environment, this relational database can obviously grow very large over time. Further, modern environments produce data in numerous different structures. The program execution trace under consideration here distributes data across multiple independent data stores and works with varied data structures. In this case, the team chose MongoDB because its document-based model is a better fit for execution trace records than a relational database. The migration achieves two important goals; it reduces volume and load on the central classic Oracle database, and at the same time it benefits from MongoDB’s built-in clustering capabilities.

A simplified schematic view of a relational database.

Fig. 1
A very simplified view of the relational database

Building the tool

One of the target use cases for the tool was a set of roughly five million program executions. Those executions involved an average of 43 objects, so that led to as many as 550 million lookups in the complete task. These significant numbers meant that the team not only had to find a way to do the migration, the team members had to find a way to get the task to run within an amount of time that met business and operational requirements.

The tool accomplishes its goal in five steps:

Reading the data from the old database
Cleaning the data
Transforming the data
Persisting the data
Validating the migrated data

The last step, validating the data, is crucial because it ensures that the integrity of the data has been kept. Because of the importance of validating data and data integrity, those were the team’s initial priorities and performance took a back seat. As a result, the first implementation was slow: it took up to 14 days to manage a complete migration of around 1 TB of data.

Each step toward the goal had a corresponding part in the tool. First, a pre-processor initialized the conditions that the tool required. The reader and chunk iterator dealt with organizing and reading chunks of data from Oracle, as well as stopping and starting. The transformer did the actual work of transforming the data. The persister was initially not separate, but was split out during the process of optimization. It stored the data to the new database. The validator ensured that all data was transferred correctly.

A table showing the improvements gained from each type of optimization.

Fig 2.
Improvements gained by optimization

How does the tool work?

To migrate the data, the team initially built a multi-threaded tool using an iterative chunking approach to the data, and in which each thread is independent, handling its own reading, processing, and writing. There was a synchronized singleton that coordinated among the threads, and distributed the chunks of data to the process.

The team divided the tool into five parts: a pre-processor, a combined reader and “chunk iterator,” a transformer, a persister, and a validator. The pre-processor initialized the conditions that the tool required. The reader and chunk iterator dealt with organizing and reading chunks of data from Oracle, as well as stopping and starting. The transformer did the actual work of transforming the data. The persister was initially not separate, but was split out during the process of optimization. It stored the data to the new database. The validator ensured that all data was transferred correctly.

Why was it important to have a pre-processor? Within entimICE, nested executions could lead to a situation in which some executions could be deleted, so developers made a design choice to clone executions. This approach led to duplications, which should be ignored in a migration. The pre-processor was built to identify all of the clones, leading to a list of top-level executions for the migration, which thus picks up all of the nested executions as well. The pre-processor creates an “ignore list” of the clones, which do not require migration.

The “chunk iterator” was a synchronized singleton that provides each thread with a list of executions to process in chunks of 1000 executions. The chunks were stored in MongoDB, and any errors were noted. When the tool was started, it recorded and stored a max execution ID, so that it would not process any data that was added after the start of the tool. The reader was an extremely simple part of the tool, just requesting chunks from the chunk iterator and calling the transformer to process the chunk.

The transformer translated old entities to new entities; that is, Oracle execution traces were transformed into MongoDB data execution traces. Along the way, it improved the handling of program execution data by removing irrelevant objects, correcting program versions, and so forth. Upon completion of the translation, it added the processed chunk of 1000 executions to the persist thread queue.

The persister was originally part of the transformer, but during optimization the team moved it to its own thread. It persisted data to the MongoDB and marked chunks as completed. It was a very simple and reusable tool, one that used a blocking queue to receive from the processing threads. As long as there was data in the queue and the readers were going, it would keep looping and whenever it found something it would write that to MongoDB.

The final part of the tool was the validator, which was run after the migration was completed. It built a list of program executions in Oracle that should have been transferred to MongoDB and compared that list against what was actually written to MongoDB.

Improving performance

The team found ways to enhance the tool, and wound up improving its performance massively. They employed five major techniques to optimize:

Increasing JDBC driver fetch size
Bulk writing of data to the new document database
Pre-fetching data & using entity graphs when possible to avoid lazy loading
Pre-loading commonly used lookup tables
Caching of commonly used data

The first optimization concerned the ignore list that the pre-processor was building. That list initially took almost four hours to generate. Delving into the parameters of the processes involved, the team found that the default query fetch size for Oracle was 10, so the pre-processor was going back and forth an immense number of times to fetch small queries. They re-set the size to 100,000. After that change, the ignore list took less than five minutes to generate. That near fifty-fold improvement in the pre-processing had a medium-sized effect on the overall task. The team gained some efficiencies in the validator because it also queried Oracle directly, so setting the fetch size higher improved performance in validation as well.

In a daytime mountainous landscape a robust railway bridge crosses a steep gorge. Trains that cross on it can carry many more people than the wooden bridge depicted above.

After the fetch size was increased, writing to MongoDB became the bottleneck for generating the ignore list. The tool was sending what to write as a task list, but the actual writing was still happening one by one. Changing the method from “saveAll” to “insert” forced bulk writing, which removed writing speed as a constraint. As part of this second optimization, the team also moved persisting to its own thread. When combined with bulk writing, time required for persisting took dropped from 90-120 seconds down to 2-6, depending on the chunk size. (The times discussed in this article are all related to the test hardware and could be changed by using production hardware. The conclusions for the configuration and software usage remain valid, and the points about optimization can be generalized beyond any particular configuration.) The team decided this was sufficiently efficient. These optimizations together also had a medium-sized effect on overall performance.

The third optimization turned out to be a game-changer for the team. In the original design, the tool looped over program executions and loaded them one by one, also loading any nested executions. The team changed to bulk loading parent executions along with the first level of nested executions. “Fetching them for up to 1000 executions at the same time, which is a maximum set by a limit within Oracle, we also introduced EntityGraph, which tells part of the Oracle software that it should pre-fetch child entities and grandchild entities when it is loading a program execution,” explains Alexander Lueders, who was part of the team that developed the tool. Memory management became a problem with the pre-fetching, so the team reduced the size of each chunk. Those changes in design reduced migration time from 14 days to four days. The team had moved the tool much closer to the performance needed for practical use. “That was a huge impact,” adds Alexander.

The fourth optimization involved pre-loading data. The team introduced pre-loading because when the tool fetched parent and child executions, it wound up getting clones. The solution was to find the original and pre-load that execution. Taking this approach also reduced the query to only the data they needed and not the entire entity, so the tool could afford to store it in memory. The data was pre-loaded during initialization and a map was built in memory. Building the map took about 30 second, and saved millions of queries. This was cleaner and more efficient, but with a minor effect on overall performance.

The fifth optimization brought another big jump in performance. “We had two lookups – before and after the execution – for the version of for every object that was used in an execution,” explains Henrik Steudel, another team member. “With an average of 43 objects per execution that added up to a total of 550 million lookups for our database of 5 million executions, and because of the massive amount of versions pre-caching was not feasible.” The team chose to use Caffeine Cache with Spring to implement a caching strategy. The lookup time for each one dropped from 1.6ms to 0.3ms, a five-fold improvement. Those accumulated gains dropped the migration time from four days to half a day. The tool was now processing about as fast as it could write to MongoDB.

With its process of optimization, the team improved performance by a factor of roughly 27, going from 14 days for a migration to 12 hours.

Conclusions

What are the general lessons learned for optimizing and improving performance?

Understanding the data and data structures is key to optimizing. “Be sure to know the settings and limitations in the underlying technology,” says Vegard. “In this case, key parameters turned out to be the Oracle fetch size, IN query limit, and MongoDB saveAll vs insert.”

Using the right tool for the situation and combining multiple optimizations led to huge benefits. Analyzing required entity data that was needed for loading revealed ways to reduce overhead. Optimization can be a trial and error process that takes time.

Working on real data demonstrated how large datasets that grow over time tend to accumulate anomalies that have to be considered during migration. There’s nothing like real-world data to bring out corner cases.

By the end of the process, the team had delivered a migration tool that was showing significant improvement from its first iteration. The previous system could be migrated to an underlying MongoDB system without affecting the rest of the entimICE system, gaining significant performance improvements in the overall system and ensuring compatibility with future updates or service packs.

* Upper image by Wikimedia user Famberhorst under a Creative Commons Attribution-Share Alike 4.0 International license. Lower image released to the public domain by its creator.