Jenkins

On a quest of reducing Jenkins build time – Part 2

Previous – On a quest of reducing Jenkins build time – Part 1.

With all our efforts we had managed to get our build pipeline time down to under 12 minutes. Our quest still continues…

pipe line 1

We were using cvs and our build job was taking couple of minutes just to determine the change log. The next target was to check if migrating from cvs to svn would help us reduce our build times further.

Matthew Meyers (our infrastructure guru) who works at IDeas Inc, had setup a cool Jenkins infrastructure in IDeaS Data Center for our next generation product. While having a conversation with him he suggested that we could do an experiment of migrating the local Jenkins infrastructure to data center and also migrating from cvs to svn to check if it helps us further reduce our build times.

Matthew created a dedicated Jenkins Slave for this effort, migrated cvs to svn and did the setup of this parallel experimental infrastructure. After some initial hiccups, we got all the tests running fine and we were delighted to see that the total build pipeline time had reduced down to less than 8 minutes. SVN migration, bigger better machines, SAN infrastructure had helped reduce the build timings.

We finalized a date when we would cut over to this new infrastructure. The cut over went through fine too. Now we have a nice consolidate Jenkins infrastructure in our data center.

This migration had its own share of new learning…

robocopy/rsync.

Matthew introduced me to robocopy. Robocopy is so cool. Earlier we were using simple copy command to copy workspace, database and files in general. Robocopy has a feature of mirroring files in the source folder to destination folder and the cool thing is that it can automatically skip copying files that are not modified. This feature helps in saving lot time while doing file operations.

This ability helped us in adding two more test jobs 1) REST Test 2) Business Driven Tests in our build pipeline which we could not do earlier as both the jobs required us to deploy the application and have a larger pre-populated database to carry out our tests, which earlier was a time consuming activity.

Could not reserve enough space for object heap

As we added two more test jobs to run in parallel, the jobs started failing with error “Could not reserve enough space for object heap“. So far I had faced this issue when I exceed the -Xmx<size> (set maximum Java heap size memory) limit for a 32 bit process. In this case it was a 64 bit process, we had 32 GBs of ram available and while the jobs were running I could see ample memory space still available on the box.

After spending couple of hours googling and observing the task manager while the jobs were running, I learnt something new. The task manager has a section called as “Committed Memory”

Task Manager

I observed that though the machine was showing lot of free memory available, I was getting the memory error whenever the committed memory was crossing the physical memory threshold of 32Gbs. You can see the actual memory consumed and the committed memory per process in the task manager.

Committed memory

Using the task manager, I found that all the java processes and some mysql processes were showing that most of them were consuming 2+ GBs of committed memory. After reducing the value of -Xmx, the committed memory on the java processes went down.

After reducing the size of mysql variable key_buffer_size, the committed memory for mysql processes went down.

Finally, after bringing down the committed memory size, all the jobs started running in parallel without any issue.

Now even after adding two more test jobs, our build pipeline time is down to 10 minutes.

pipeline

Previous –  On a quest of reducing Jenkins build time – Part 1.

 

CPU profiling for rescue

Prev – Test on smaller data set

I use Eclipse editor for my development since many years and couple of years ago I had looked for good CPU profiling plugins for Eclipse and had not found one, since then just for profiling java code, I use NetBeans editor as it had in-built support for CPU and Memory profiling. I started NetBeans for carrying out CPU profiling and for some unknown reasons, I was just not able to profile my JUnit test suite in NetBeans. Since I use NetBeans just for profiling, I have to keep doing lot of setup on it, which I hate, and now I also had to figure out why the profiling was not running. I had no patience to figure it out, instead I attempted to check if there are any new profiling plugins available for Eclipse and I came across JVM Monitor and boy, now I am loving it!

Profiling our tests highlighted following issues in our code.

  • The code was scanning resource bundle files, a disk IO intensive operation, multiple time. I just cached them.
  • Many tests were loading the spring file system application context. Some tests were doing it in the @Before method, causing the context to be loaded before every test method in that class. This is again a disk IO intensive operation. After some refactoring of the test code, I could reuse the context in majority of the tests.
  • Our code was sending out emails during our tests, which was taking time and not required. I skipped sending out emails.
  • There was a class which was accepting a java Date object and the number of days to be added to the date. Code snippet below. This method was invoked thousands of time.

public static Date addDays(Date date, int numberOfDays)

{

Calendar  calendar  = Calendar.getInstance();

calendar.setTime(date);

calendar.add(Calendar.DATE, numberOfDays);

return calendar.getTime();

}

To my surprise, this simple looking code is not efficient at all. I refactored the code as below and it is much more efficient.

public static Date addDays(Date date, int numberOfDays)

{

long number = ((long)numberOfDays) *MILLISECONDS_IN_ONE_DAY;

return new Date(date.getTime() + number);

}

After all the profiling and refactoring, most of the test jobs started getting over within 15 minutes. Without using SSDs/Hybrid disks, I was able to get the CI build pipeline time down to about 25 minutes. Now it is not a surprise as to why the RAM Drives did not show much improvement on our actual Jenkins. Disk IO not related to the DB was our bottleneck. So though SSDs are fast, it turns out that SSDs are not alternatives for sloppy programming! lol.

Next – On a quest of reducing Jenkins build time.

 

Test on smaller data set

Prev – The excitement and the disappointment

After going through the SAS logs I found that it was not doing operations on large data set, but it was doing larger number of very small data set operations. I realized that in the test we were forecasting for next 10 years, which was not necessary. I change the configuration to forecast window to 60 days and it brought down the test time from 50+ minutes for this specific job to around 12 minutes. I found the solution very silly, something that probably I should have looked earlier. lol. My new hypothesis started proving correct. Now next I had to look at the java tests, I thought profiling the code could give me more insights.

Next – CPU profiling for rescue

The excitement and the disappointment

Prev – The “eureka” moment – discovery of RAM Disk Drives

Now there was no stopping, I did not need an SSD, all I had to do was install the RAM drive on our actual Jenkins windows machine, point SAS temporary data set creation to tmpfs and get rid of all the database IO issues and bring the build time on the Jenkins machine from 1+ hour to 25/30 minutes. Ajay allotted more RAM to both the Jenkins and the SAS Linux VMs. I did the needful and triggered the build process and to my astonishment, the build timings did not reduce much on our actual Jenkins and SAS Linux machines. I thought I did not do something right. I reconfirmed the changes on both the machines, re triggered the build and the result was same. I was completely shocked and disappointed. Though it was supposed to work, my hypothesis had failed and I was wondering why. The only reason I could imagine for the hypothesis to not work was that probably the SAS data set operations and the MySQL database operations that we were performing via tests were not as IO intensive as I had assumed earlier and there was something else in play. This was my new hypothesis.

Next – Test on smaller data set

The “eureka” moment – discovery of RAM Disk Drives

Prev – The alternative for SSD, in-memory/in-process db

I was feeling very impatient and was wondering what to do. Was there an alternative to SSD? While thinking over it and with the word “memsql”, I had an “eureka” moment. I was wondering what if I was able to create a drive in RAM and have the entire MySQL db on it. Now, though it won’t be in-process, it would be completely in-memory and without any code changes it would be faster than any hard disk, hybrid disk, SSD. That was just a hypothesis and to my surprise I found that there are software’s that would convert extra RAM into RAM drive. I also came to know that Linux (any Unix flavor) already has this built-in with its tmpfs. I also came across these two blogs “RAMDisks Roundup and testing”, “RAM Disk Software Benchmarked”. Based on available benchmarks I shortlisted “SoftPerfect RAM Disk”. I was super excited and I wanted to immediately try it on the test VM servers and the build time did came down further!!!. The hypothesis had worked, the build time on the test VM Server came down to 25 minutes.

Next – The excitement and the disappointment

The alternative for SSD – in-memory/in-process db

Prev – The assumption around IO and SSD

To get around the IO issue with MySQL queries, I had already looked at in-memory and/or in-process DB options like MySQL memory engine, hsqldb,h2database, memsql. MySQL memory engine had some limitations over myisam engine and did not bring down the test time much. Hsqldb was not supporting many MySQL queries so was discarded. H2database looked promising as it could support many MySQL queries but still required couple of modifications to our code, which I thought was not worth the time and efforts. Secondly, it would have solved the issue just for our unit tests, but what would we do with the integration tests where code from two machines had to connect to the same db? In which case we couldn’t use the in-process option. Girish is still working on it. Let’s see where we land up with it. memsql looked most promising as it is wire compatible with MySQL, which means without code changes I could just point to memsql and be done with it. Two aspects discouraged me from going that path, firstly memsql can only be install on Linux, second I came across the blog MySQL is bazillion times faster than MemSQL

Next – The “eureka” moment – discovery of RAM Disk Drives

The assumption around IO and SSD

Prev – The discovery of Ant JUnit task options

We use MySQL and most of our tests make use of database calls. We use empty db with seed data for our tests and the tests themselves create the required test data, invoke the method under test and then assert the output in db. Involvement of lot of database IO is one of the reason why our tests were running slow, at least that’s what my assumption was. Now I am not in a position to reduce the db dependency in our tests immediately, I still had to get around the problem immediately. We run SAS on Linux VM and my second assumption was that the SAS dataset operations are IO intensive. So now it was clear in my mind that SSD is going to solve the rest of my problem.

My colleague Ajay from IT, helped me by setting up a VM Server with hybrid disk and one physical desktop machine with SSD for me to test the Jenkins timings. After setting it up, both the machines brought down the timing of the jobs significantly. From 1+ hour we were down to about 35 minutes. The physical desktop with SSD was faster than the VM Server with hybrid drive. Now procuring VM servers with SDD was expensive and time consuming, requiring budget changes and approvals. Hybrid drives are cheaper than SSDs. In the meantime Ajay started working on hardware to figure out why VM with hybrid drive was not performing to our expectations. I am hoping Ajay will publish his own blog on his experience.

Next – The alternative for SSD, in-memory/in-process db

The discovery of Ant JUnit task options

Prev – The problem with duplication

We use Ant and we were using fork=”yes” option of the ant junit task. After reading more, I understood that when fork option is turned ON, the default forkmode option is set as “perTest”. i.e. for every TestClass a new JVM gets created. Now this is an expensive operation. Instead of “perTest”, I used “once” option, which creates only one JVM for all the tests. With this option, the tests started failing due to permgen errors. After increasing the permgen memory size, the tests ran faster. The time was reduced further by about 20 minutes for our unit tests, though it was still taking over 30 minutes to run. Our SAS integration tests were still taking 50+ minutes to complete.

Next – The assumption around IO and SSD

The problem with duplication

Prev – On a quest of reducing Jenkins build time.

While digging into the issue, I found that most of the test jobs were first deleting their workspace and then copying the workspace of the base job, of roughly 1.8GBs, to their own workspace before running tests. This activity was taking roughly 10 minutes. This time was easily reduced by using the utility mklink, which allows creating directory links in windows. So now that the workspace for rest of the jobs were linked to the workspace of the head job, there was no need of deleting OR copying workspace. Girish, my colleague, later highlighted that I could also use “Use custom workspace” option available under “Advanced Project Options” of Jenkins. lol!

Well the problem with duplication is not just limited to code..

Next – The discovery of Ant JUnit task options