By Nina Mulamba
Google finally joined the bandwagon by launching a new service called Cloud Dataproc whose purpose is for processing Big Data and analytics. The platform supports real-time streaming, batch processing, querying, and machine learning. This service complements existing data services such as BigQuery, Bigtable, Cloud Dataflow, and Cloud Pub/Sub.
Cloud Dataflow is a unified programming model and a managed service for developing and executing data processing patterns such as ETL, batch computation, and continuous computation. Cloud Pub/Sub is an asynchronous messaging engine based on publishes-subscribe pattern for sending and receiving messages.
Cloud Bigtable is a scalable NoSQL database that is compatible with HBase while the BigQuery is the managed data warehouse and analytics platform.
A typical Big Data platform needs services for ingestion, streaming, transformation, storage, querying, real-time processing, and batch processing therefore, Cloud Dataproc plugs a significant gap that existed in Google Cloud Platform while integrating with existing Google Cloud Platform services.
Google claims that its Big Data stack is much cheaper and faster than the competitive offerings as it will only take 90 seconds for operations such as starting, scaling, and shutting down a cluster.
This feature lets customers focus on the processing job than managing the clusters therefore the compute is priced at 1 cent per virtual CPU in addition to the standard Google Cloud Platform resources. This cost is for spinning up and running a Hadoop or Spark cluster.
Cloud Dataproc is accessible from Google Developers Console, command line interface of Google Cloud SDK, and Cloud Dataproc REST API. The platform runs on Spark 1.5 and Hadoop 2.7.1.
Though Google is late to the party, its credibility in the space of Big Data and the engineering innovations in Google Cloud Platform will help the company acquire customers.
Google still charges you for Compute Engine, Big table and Cloud Storage separately. Customers can take advantage of the pre-emptible VMs for jobs that run for a short duration.