Migration to Google Cloud and Big Data

Migration to Google Cloud and Big Data

We were extremely successful building Analytics platform on Amazon for our customer. The stack included the following:

  • EC2 Servers
  • Amazon S3
  • Nginx Gateway
  • Amazon VPC
  • Apache Cassandra – Big Data
  • Percona – Relational Database
  • Spark – Big Data
  • SSD Storage for Performance
  • Apache Mesos
  • Apache Marathon
  • Apache Chronos
  • Nagios and Cacti for monitoring
  • Vector and Monitorix for Performance
  • Docker Container
  • Logstash – Logging

The primary reason for migration was better performance and cost saving. Google BigData platform was the main selling point. When we did simple benchmarks of loading data and querying data, the performance for the cost was unprecedented. Our savings in BigData costs was easily a factor of 8. And we had complete support from our customer to migrate to Google Cloud. It was partly driven by the delay in loading data and providing actionable analytics time with exploding costs.

Since BigQuery is a managed service, there is no cost for managing the infrastructure. And the pricing is reasonable. And the most important part of the benefit was that we could experiment and recover very fast which is almost impossible in any normal big data solution. The cost of redesign is expensive in terms of resources, people, and time.

And in most BigData solution, data storage is distinct from compute engine. In our example above, the big data storage was Apache Cassandra and Compute Engine was Spark. And another problem is that designing aggregate tables is again expensive and the workflow of using Apache Cassandra and Apache Spark is non-trivial.

BigQuery addresses all these problem really well. And make BigData implementation accessible to anyone who has reasonable interest and basic skills.

The Google Cloud and BigData stack used:

  • Google Compute Engine
  • Kubernetes
  • Nginx
  • Google Container Engine
  • Percona – Relational Database
  • Docker Container
  • Google Logs
  • Nagios and Cacti
  • Google BigQuery

Clearly the moving parts in terms of infrastructure has reduced considerably. Though the data growth is exponential, the management of the infrastructure is manageable – thanks to BigQuery.

A detailed Blog on Google Migration will be soon made available on our Blog.

  • Date July 17, 2016
  • Tags Cloud Computing