If our developers have what they need to create the best possible applications, they—and our customers—benefit
Goji Investments was founded on the principle of democratizing access to real estate, business lending, renewables and other alternative investments. The idea is to allow financial service organizations to provide traditionally unavailable investment opportunities to private investors on a global scale. A critical component of the Goji plan for success is the investment platform. Not only do we make the application available as a white-label solution, it can easily integrate into other platforms via APIs. We wanted to create a modular platform that we could update in pieces when needed and easily scale up as clients came online.
High-Performance Development Teams
A major reason for Goji Investments‘ success in these first few years is our high-performance development organization, which is instrumental in building and delivering the vision of the investment products needed to generate new business. As we’ve moved from simply building our applications into making them more scalable, we’ve focused on getting the right tools in place for Ops and Dev. Of course, bonus—it makes developers happier if they have cool tools to use.
One of the things we try to do is keep our application/platform deployment process as agile as the applications themselves—constantly looking to improve and making updates whenever necessary, whether needed to deploy new technologies and applications or based on feedback from the team. The idea is to keep our DevOps pipeline operating smoothly and efficiently.
One of our key operational strategies is that if we make our developers happy at work, they will deliver better quality applications, which results in happier end users. Of course, developers like to create to come up with innovative ways to deliver great features. One roadblock to coding innovation is performing repetitive low-level tasks, so we’ve actually gone out of our way to automate whatever we can and free up Dev resources for more value-specific application development.
Automating Deployments is a Start, but Just a Start
We automate the deployment process through GitHub. Each developer pushes their own services into GitHub themselves, where it goes through a quick review process. After review (if it passes), it’s merged into the master. Any master merge triggers build pipeline execution (which is using ThoughtWorks GoCD), and this automation includes several rounds of testing: unit test and integration test followed by an automated acceptance test. Developers have the option of executing additional tests themselves when desired.
Any new master is promoted into pre-prod, which mirrors production. When changes are finally rolled into production, we use HashiCorp Nomad’s blue-green deployment capabilities—if a problem were to occur (which means it was missed in official testing), Nomad takes care of any rollback needs.
The stack is fairly straightforward: The applications are written in Java and run on a DropWizard framework in containers. Nomad is our orchestration platform. One of our foundational principles when selecting application and infrastructure technologies is the ability to automate our deployment processes. We’ve done a fairly good job of automating anything we possibly could, but automating monitoring was more difficult. Our traditional APM tool was fine for alerting us to issues such as an EC2 crash or a resource availability issue for a reboot, but if we wanted to actually troubleshoot the issue we essentially just had to examine logs and hope we could see something out of the ordinary.
As much as everyone is excited about writing data into logs, combing through those logs to troubleshoot application issues is a waste of time and valuable developer resources. After all, operating 20 different microservices that cover everything from taxes and payments to customer notification required optimization of each element of the DevOps pipelines. The last piece we added was performance monitoring.
The Value of End-to-End Distributed Tracing
There’s a great line from Jurassic Park: “T-Rex doesn’t want to be fed. T-Rex wants to hunt.” Well, developers don’t want to be analyzing log files; they want to code. In fact, they want to be creating the next great feature. To get our dev team out of the weeds, we began thinking about a different way to get—and analyze—application performance data.
We wanted a system that added observability, monitored performance, managed service levels and included visibility from our end user requests all the way into our back-end systems. Oh yeah, we also wanted all of that to happen automatically so that our developers never had to worry about anything other than their own features. Installation, agent setup, discovery, mapping, alert configuration and reporting—as much as possible without requiring human setup.
We knew tracing was critical—so much so that we avoided missing any problems by specifying a 100% solution, wanting a complete trace of every request to the application.
Troubleshooting Application Performance Problems
The real reason you buy application monitoring software is to solve problems, so optimizing our ability to find bottlenecks and eliminate them is a key component of our monitoring solution. First, like the other aspects of monitoring, we didn’t want to have to spend time configuring thresholds for warning and danger alerts. Our need was for automatic anomaly detection with event analysis. Every little bit helps keep our dev team focused on dev, not analyzing data—especially in logs.
Of course, sometimes, you can’t help it; you have to look at log data, but we wanted to be smarter about that, too. We wanted our application monitoring tool to automatically integrate log analysis tools so that all data could be analyzed at the appropriate level whenever needed.
Don’t Forget Testing
Sometimes, testing gets lost in the value of a good monitoring solution. While we weren’t necessarily looking for a testing tool, we knew if a monitoring solution could also operate effectively in test (that’s not just the ability to run in test; rather, I mean the ability to add value in the testing process), well, so much the better.
It turns out that our applications are quite complex, and that complexity can create lengthier test cycles—commonly in this scenario:
We recently released a new feature for distributing dividends. As you can imagine, this is a fairly important feature, so we wanted to ensure that it operated as desired and expected. We were especially concerned with being able to scale the feature appropriately. We set up a stress scalability test that took 20+ hours to run (essentially processing hundreds of thousands of payments).
What we didn’t count on was having to stop some way into the test for someone to either submit an optimization update or actually fix a bottleneck—getting 10-12 hours into the test, then having to halt it and restart it—over and over and over again. We started thinking we might never finish.
This is one of the pain points we hoped to solve with an automated way of monitoring. By pointing the automatic APM solution at our test environment, not only did we not have to configure anything (or even set up a special start command) but we also got almost immediate feedback on any change (both positive and negative). It also pinpointed the exact location in the code where the problem occurred, so the developer could almost immediately update their code and initiate a restart of the test. Finally, we could pick any two runs and compare their execution and performance to identify the real problem.
Two things happened:
- We were able to iterate different runs ~5 times faster.
- We got such great bottleneck and debugging data from our solution that we cut the runtime of the test from 20 hours to less than one hour.
Give Developers the Tools to Shine
While all employees like to succeed, developers especially love when they can rise above their challenges and deliver high-performance code. We like to give our dev team the best possible tools to help them achieve this excellence. Approaching the decision process for selecting a tool for them should be handled no differently than thinking about your end users:
- Identify pain points.
- Find a resolution.
- Execute the solution.
For our development team, that meant relieving them of low-thought repetitive tasks that we could automate and stop making them spend their time culling through performance logs.
Making the Right Choice??
The long and short of it is that we replaced our legacy APM tool with a solution that delivers the automation we crave and eliminates the need for log deep-dive analysis to solve any problem. It also supports log analysis with various integrations, which was important to us.
We get a trace of every single call, which is important to the Ops and customer support teams—and they profile every production process automatically, which gives our developers more than enough information to take action.
Ultimately, we’ve accelerated our application delivery and update processes, streamlined testing and monitoring. And our developers are super happy that we give them the tools they need and then challenge them to make the best possible applications using those tools.