What is our primary use case?
Our primary use case of this solution was for working on PNRs and user journey plans for an airline. Things such as check-in times, airport arrival time, boarding times, etc. We stored all that data in Cassandra. I currently work as a chief technology officer.
How has it helped my organization?
The solution provided us with more than 100K PNRs a second and because the company was international there was a heavy data write, and at the same time a heavy data read. Cassandra helped us a lot, specifically to heavy write the data which was helpful and an amazing solution for us.
What is most valuable?
I think the time series data was one of the best features along with auto publishing. For logging purposes, for example, you can say that after 30 days you won't need the data anymore and it goes. It was a great fit for our requirements. The good thing is that every cluster, every node in the cluster synchronizes the data in real time. That is something amazing that we loved.
What needs improvement?
One of the issues with the solution is that you cannot drop write like you're able to in MongoDB and MySQL, where you can join tables. Cassandra doesn't have joins between tables so you need other tools for that. You need to read all the data and put in memory and then add the joins. That is the area where I think they need improvement. Secondly, for example, when setting up your cursor, you have to be very sure about the read mechanism, because if you're not following the read mechanism and mistakenly build a key that is no longer unique then you start overriding data. There are a lot of improvements they could make including on the OS.
What do I think about the stability of the solution?
The stability is good although sometimes the solution slows down. I liked it and it's good for big data.
What do I think about the scalability of the solution?
The solution is scalable. If you need more nodes in your cluster, you can simply turn on a new node and it will automatically start synchronizing data. In real time, it will start sinking the data with that node. And that is a boost, that's the best one. The entire company was using the solution.
How are customer service and technical support?
Because we used a vendor, they supported us on technical issues and were very good. I do think they needed to improve their documentation.
Which solution did I use previously and why did I switch?
I have also previously used MongoDB which, from a technology perspective, has a collection base while Cassandra keeps data in the tables. It's a major difference. Every platform has its pros and cons. Cassandra does not provide an adopter kind of scenario. You need to use third parties to manage the relations. These are the differences and similarities but Cassandra does have a table structure which MongoDB does not have.
How was the initial setup?
The vendor helped us with implementation. We had a team of around 25 working on deployment. Deployment was in multiple regions so it would definitely take a few hours, but let's say a three node cluster can be implemented in a couple of hours. It's a matter of understanding the architectural aspects. Once you have that you can decide on configuration.
What's my experience with pricing, setup cost, and licensing?
This was for an enterprise company and they are expensive. Cassandra has a heavy pricing mechanism because it's a yearly license. I'm pretty sure we were paying something around $50,000 annually at that time.
What other advice do I have?
I would suggest not over-complicating things. If you really need to have heavy write and you are okay with building keys by yourself, then go with Cassandra. If not, then the culture base is there, MongoDB is there. And MongoDB is the best one. If you are not enterprise, then don't kill yourself. Once I started working on Cassandra, the biggest lesson for me was needing to build. I need keys to retrieve data. If my key and the primary key is not well settled or well configured, then it is very tough for me to read data.
I would rate this solution a seven out of 10.
Which deployment model are you using for this solution?