Prev – Test on smaller data set
I use Eclipse editor for my development since many years and couple of years ago I had looked for good CPU profiling plugins for Eclipse and had not found one, since then just for profiling java code, I use NetBeans editor as it had in-built support for CPU and Memory profiling. I started NetBeans for carrying out CPU profiling and for some unknown reasons, I was just not able to profile my JUnit test suite in NetBeans. Since I use NetBeans just for profiling, I have to keep doing lot of setup on it, which I hate, and now I also had to figure out why the profiling was not running. I had no patience to figure it out, instead I attempted to check if there are any new profiling plugins available for Eclipse and I came across JVM Monitor and boy, now I am loving it!
Profiling our tests highlighted following issues in our code.
- The code was scanning resource bundle files, a disk IO intensive operation, multiple time. I just cached them.
- Many tests were loading the spring file system application context. Some tests were doing it in the @Before method, causing the context to be loaded before every test method in that class. This is again a disk IO intensive operation. After some refactoring of the test code, I could reuse the context in majority of the tests.
- Our code was sending out emails during our tests, which was taking time and not required. I skipped sending out emails.
- There was a class which was accepting a java Date object and the number of days to be added to the date. Code snippet below. This method was invoked thousands of time.
public static Date addDays(Date date, int numberOfDays)
Calendar calendar = Calendar.getInstance();
To my surprise, this simple looking code is not efficient at all. I refactored the code as below and it is much more efficient.
public static Date addDays(Date date, int numberOfDays)
long number = ((long)numberOfDays) *MILLISECONDS_IN_ONE_DAY;
return new Date(date.getTime() + number);
After all the profiling and refactoring, most of the test jobs started getting over within 15 minutes. Without using SSDs/Hybrid disks, I was able to get the CI build pipeline time down to about 25 minutes. Now it is not a surprise as to why the RAM Drives did not show much improvement on our actual Jenkins. Disk IO not related to the DB was our bottleneck. So though SSDs are fast, it turns out that SSDs are not alternatives for sloppy programming! lol.
After going through the SAS logs I found that it was not doing operations on large data set, but it was doing larger number of very small data set operations. I realized that in the test we were forecasting for next 10 years, which was not necessary. I change the configuration to forecast window to 60 days and it brought down the test time from 50+ minutes for this specific job to around 12 minutes. I found the solution very silly, something that probably I should have looked earlier. lol. My new hypothesis started proving correct. Now next I had to look at the java tests, I thought profiling the code could give me more insights.
Next – CPU profiling for rescue
Now there was no stopping, I did not need an SSD, all I had to do was install the RAM drive on our actual Jenkins windows machine, point SAS temporary data set creation to tmpfs and get rid of all the database IO issues and bring the build time on the Jenkins machine from 1+ hour to 25/30 minutes. Ajay allotted more RAM to both the Jenkins and the SAS Linux VMs. I did the needful and triggered the build process and to my astonishment, the build timings did not reduce much on our actual Jenkins and SAS Linux machines. I thought I did not do something right. I reconfirmed the changes on both the machines, re triggered the build and the result was same. I was completely shocked and disappointed. Though it was supposed to work, my hypothesis had failed and I was wondering why. The only reason I could imagine for the hypothesis to not work was that probably the SAS data set operations and the MySQL database operations that we were performing via tests were not as IO intensive as I had assumed earlier and there was something else in play. This was my new hypothesis.
Next – Test on smaller data set
I was feeling very impatient and was wondering what to do. Was there an alternative to SSD? While thinking over it and with the word “memsql”, I had an “eureka” moment. I was wondering what if I was able to create a drive in RAM and have the entire MySQL db on it. Now, though it won’t be in-process, it would be completely in-memory and without any code changes it would be faster than any hard disk, hybrid disk, SSD. That was just a hypothesis and to my surprise I found that there are software’s that would convert extra RAM into RAM drive. I also came to know that Linux (any Unix flavor) already has this built-in with its tmpfs. I also came across these two blogs “RAMDisks Roundup and testing”, “RAM Disk Software Benchmarked”. Based on available benchmarks I shortlisted “SoftPerfect RAM Disk”. I was super excited and I wanted to immediately try it on the test VM servers and the build time did came down further!!!. The hypothesis had worked, the build time on the test VM Server came down to 25 minutes.
To get around the IO issue with MySQL queries, I had already looked at in-memory and/or in-process DB options like MySQL memory engine, hsqldb,h2database, memsql. MySQL memory engine had some limitations over myisam engine and did not bring down the test time much. Hsqldb was not supporting many MySQL queries so was discarded. H2database looked promising as it could support many MySQL queries but still required couple of modifications to our code, which I thought was not worth the time and efforts. Secondly, it would have solved the issue just for our unit tests, but what would we do with the integration tests where code from two machines had to connect to the same db? In which case we couldn’t use the in-process option. Girish is still working on it. Let’s see where we land up with it. memsql looked most promising as it is wire compatible with MySQL, which means without code changes I could just point to memsql and be done with it. Two aspects discouraged me from going that path, firstly memsql can only be install on Linux, second I came across the blog MySQL is bazillion times faster than MemSQL
We use MySQL and most of our tests make use of database calls. We use empty db with seed data for our tests and the tests themselves create the required test data, invoke the method under test and then assert the output in db. Involvement of lot of database IO is one of the reason why our tests were running slow, at least that’s what my assumption was. Now I am not in a position to reduce the db dependency in our tests immediately, I still had to get around the problem immediately. We run SAS on Linux VM and my second assumption was that the SAS dataset operations are IO intensive. So now it was clear in my mind that SSD is going to solve the rest of my problem.
My colleague Ajay from IT, helped me by setting up a VM Server with hybrid disk and one physical desktop machine with SSD for me to test the Jenkins timings. After setting it up, both the machines brought down the timing of the jobs significantly. From 1+ hour we were down to about 35 minutes. The physical desktop with SSD was faster than the VM Server with hybrid drive. Now procuring VM servers with SDD was expensive and time consuming, requiring budget changes and approvals. Hybrid drives are cheaper than SSDs. In the meantime Ajay started working on hardware to figure out why VM with hybrid drive was not performing to our expectations. I am hoping Ajay will publish his own blog on his experience.
Prev – The problem with duplication
We use Ant and we were using fork=”yes” option of the ant junit task. After reading more, I understood that when fork option is turned ON, the default forkmode option is set as “perTest”. i.e. for every TestClass a new JVM gets created. Now this is an expensive operation. Instead of “perTest”, I used “once” option, which creates only one JVM for all the tests. With this option, the tests started failing due to permgen errors. After increasing the permgen memory size, the tests ran faster. The time was reduced further by about 20 minutes for our unit tests, though it was still taking over 30 minutes to run. Our SAS integration tests were still taking 50+ minutes to complete.
While digging into the issue, I found that most of the test jobs were first deleting their workspace and then copying the workspace of the base job, of roughly 1.8GBs, to their own workspace before running tests. This activity was taking roughly 10 minutes. This time was easily reduced by using the utility mklink, which allows creating directory links in windows. So now that the workspace for rest of the jobs were linked to the workspace of the head job, there was no need of deleting OR copying workspace. Girish, my colleague, later highlighted that I could also use “Use custom workspace” option available under “Advanced Project Options” of Jenkins. lol!
Well the problem with duplication is not just limited to code..
In my organization we are using Jenkins as our CI tool. The core build is followed by multiple jobs consisting of unit tests, integration tests, SAS integration tests, PMD, all running in parallel, running over 3000+ tests which took the entire build pipeline to run over 1 hour 30 minutes to produce the build artifacts. The amount of time taken was too high and it was very frustrating specially when the tests failed, as multiple developers would check in files while the earlier build was in progress, the next job would start and by the time issue was identified and fixed, it would take more than 4/5 hours to get a stable build. QA would not get build artifacts on time. Valuable development time was getting lost. There were frustrations all around.
This persistent issue pushed me on a quest to reduce the CI build time.
Today Ajay moved our Jenkins VM to a box having hybrid disk and now the build pipeline time has reduced from 25 minutes to 15 minutes and all the test jobs run in less than 10 minutes!!! And I am feeling very happy and satisfied on my quest of reducing the build time. This journey took more than two months, during which I have learnt a lot.
- Jenkins – http://jenkins-ci.org/
- CI – http://en.wikipedia.org/wiki/Continuous_integration
- Mklink – http://technet.microsoft.com/en-us/library/cc753194.aspx
- MySQL – http://www.mysql.com/
- SSD – http://en.wikipedia.org/wiki/Solid-state_drive
- Hybrid disk – http://en.wikipedia.org/wiki/Hybrid_drive
- HSQL – http://hsqldb.org/
- H2 – http://www.h2database.com/html/main.html
- Memsql – http://www.memsql.com/
- MySQL is bazillion times faster than MemSQL
- Tmpfs – http://en.wikipedia.org/wiki/Tmpfs
- RAM Disk Software Benchmarked