Google Compute Engine Review

You can use Google Cloud Datalab to explore, analyze, transform and visualize data and build machine learning models using existing data in Google Cloud Storage.

What is most valuable?

One of the most valuable features that Cloud Datalab has is that it integrates seamlessly with other Google Cloud Platform products. You can use Google Cloud Datalab to explore, analyze, transform and visualize data and build machine learning models using existing data in Google Cloud Storage or BigQuery. Cloud Datalab is interactive, so you can run portions of your code and see the results immediately as you work through your Datalab notebook.

How has it helped my organization?

Google Cloud Datalab has made it really easy for me to perform data exploration and analysis. Previously, I would have to wait for my data to transfer to a local server with a SQL database before beginning exploration. With larger data sets, the entire data transfer and analysis process could take hours if not days. With Google Cloud Datalab and Google Cloud Platform, I can receive processed results in seconds, not hours or days, without significant delays transferring data, since my data is already in Google Cloud Storage and easily accessible for consumption in Google Cloud Platform. In addition, I am starting to leverage the Google Cloud Machine Learning APIs built into Google Cloud Datalab to receive better insights on my data.

What needs improvement?

You can already run BigQuery SQL queries in Google Cloud Datalab but one thing I would like to see improved is the user experience when constructing these SQL queries, similar the tools that the Big Query web console offers. For example, I would like to see improved support for automatically formatting SQL queries. It would also be helpful to have a button that will check a query for SQL syntax errors before running the query which is also available in the BigQuery web console.

For how long have I used the solution?

I've used Google Cloud Datalab for about 18 months. In February 2016, I ran into an issue while configuring Datalab and opened a new issue on the Datalab Github page. I received a response on the same day from a Google employee on the core Datalab development team. After resolving my issue with the help of a Cloud Datalab development team member on Github, I became more interested in Cloud Datalab to the extent that I decided to download the source code and build Datalab locally. I learned a lot in the process.

What do I think about the stability of the solution?

There have been a few (rare) instances where I've encountered a stability issue during the beta pre-production stage of the product. I should mention that it was pretty easy to revert back to a previous stable build. The stability issues are even more rare with the production version and since there is a very large user base the development team is very quick to fix regressions and troubleshoot performance issues.

What do I think about the scalability of the solution?

No, and I don't expect any issues with scalability as the Datalab Kernel is running in a Google Compute Engine (GCE) virtual machine which can be scaled according to the developer's needs. In addition, Google Cloud Platform is designed around scalability at the Petabyte-scale.

How are customer service and technical support?

10/10. The Datalab development team is very responsive on both StackOverflow and Github. I encourage you to make use of the free support that is available from the online Datalab Community. You can even build/run Datalab from source code if you have the interest to tinker around and learn more about Datalab on your own. If you prefer Google specific support, there are three support tiers available. You can also submit feedback directly from the Datalab user interface.

Which solution did I use previously and why did I switch?

Previously I used Jupyter (formerly IPython) as a tool for interactive data analysis. My primary reason for switching is that Datalab has built-in integration and high level magic commands for certain Google Cloud Platform products, such as Google Cloud Storage and Google BigQuery. In addition, Datalab has built in charting capabilities.

How was the initial setup?

Yes, the initial setup is very straightforward because the Datalab kernel is installed on a virtual machine in the cloud rather than on your local machine. In addition, the quick start documentation was very easy to follow. Google Cloud Datalab is installed using Google Cloud Shell which is accessible from a web browser which means that you can have access to the full Google Cloud Datalab user interface from a light-weight laptop such as a Chromebook.

What's my experience with pricing, setup cost, and licensing?

Google Cloud Datalab is an open source product (Apache 2.0 License). There are costs associated with having the Datalab Kernel running in a Compute Engine Virtual Machine, however you will only pay for the cloud resources you use. To save on costs, you can stop the GCE Virtual machine and start it when it is needed again. There may be other costs for additional resources that you decide to use, such as Google BigQuery or Google Cloud Storage.

Which other solutions did I evaluate?

Yes, previously I used Jupyter Notebook. Google Cloud Datalab is built on Jupyter (formerly IPython) so it was easy to transition to Google Cloud Datalab.

What other advice do I have?

Don't hesitate to try Google Cloud Datalab if you are in need of an interactive data visualization tool. Follow the quick start documentation and don't be afraid to get your feet wet. If you prefer a structured learning environment, there are also Google Approved paid courses available.

I could not have found a better product to perform interactive data analysis and begin my career as a Data Engineer. The are so many sample Datalab notebooks which makes it really easy for someone new to run and modify a Datalab notebook regardless of their level of knowledge of big data or python. After launching Datalab, simply click on the help icon in the navigation bar and then click the "Samples and Tutorials" link. Google Cloud Datalab is an open source project so reporting bugs and submitting feature requests is easy. If you're feeling brave, you can even submit a pull request in the GitHub project to fix a bug or modify Datalab functionality. The project maintainers are really welcoming and encourage participation from new contributors. In addition, the Cloud Datalab community is very responsive on StackOverflow.

**Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment