Google Professional Cloud Architect on Google Cloud Platform Exam Dumps and Practice Test Questions Set 7 Q91-105
Visit here for our full Google Professional Cloud Architect exam dumps and practice test questions.
Question 91:
A transportation analytics startup needs to process GPS location data from thousands of vehicles every second. They want to build a scalable, low-latency pipeline that ingests streaming data, enriches it with metadata, and writes it to BigQuery for real-time dashboards. Which GCP service should they choose for the stream processing component?
A) Cloud Functions
B) Dataflow
C) Cloud Run
D) Dataproc
Answer: B)
Explanation:
A transportation analytics startup processing GPS data from thousands of vehicles must support continual ingestion, transformation, enrichment, and loading of data without downtime. A stream processing system must handle high event volume and variability in throughput as traffic conditions change. A managed stream and batch processing platform supports windowing, watermarking, event timing, and stateful processing, enabling real-time transformations essential for location data analysis. GPS streams must be enriched with metadata such as route identifiers, driver IDs, or vehicle model information. This requires a processing engine capable of combining multiple data sources. The platform provides native integration with Pub/Sub for ingestion and the ability to write results directly into BigQuery for analytics. The company needs a low-latency environment capable of autoscaling to accommodate unpredictable bursts in location updates, such as during rush hours or traffic incidents. It supports autoscaling of workers, fault tolerance, and checkpointing. These features ensure consistent processing even when messages accumulate quickly. Long-running streaming pipelines also require stable environments and management of backpressure. A managed service handles recovery and scaling automatically without user intervention. A serverless function platform is good for isolated event processing, but is not suited for continuous time-window processing or stateful streams. Since location analytics often uses sliding windows for computing speeds, distances, and anomalies, the development team needs sophisticated streaming semantics beyond what short-lived functions can provide. A serverless container platform is suitable for microservices but lacks core streaming capabilities such as event-time alignment, watermarks, and complex pipeline orchestration. It also does not scale for heavy, continuous computation workloads typically required for telemetry pipelines. A managed Hadoop and Spark cluster can run streaming jobs using Spark Structured Streaming, but it requires provisioning, configuration, and ongoing maintenance. This increases operational complexity and reduces the company’s agility. A fully managed Beam-based service provides high-level APIs, allowing teams to focus on business logic rather than infrastructure. It integrates with BigQuery, allowing near real-time dashboards to show vehicle routes, congestion patterns, ETA estimates, and fleet performance metrics. It also supports side inputs for enrichment, enabling the retrieval of static metadata stored in Cloud Storage or Bigtable. For anomaly detection, machine learning models from Vertex AI can be embedded directly into pipelines, enabling predictive insights such as early detection of route deviations or unsafe driving behaviours. The startup benefits from the service’s reliability and scalability while building a highly responsive location tracking platform. Therefore, the best choice is the fully managed ETL and streaming pipeline service.
Question 92:
A global logistics corporation wants to deploy an internal application used by tens of thousands of employees worldwide. The application requires global load balancing, autoscaling, zero downtime during updates, and multi-region redundancy. It must run in containers. Which deployment model should they choose?
A) Compute Engine autoscaled MIG
B) Cloud Run
C) GKE Regional Cluster
D) App Engine Standard
Answer: C)
Explanation:
A global logistics organisation supporting tens of thousands of internal users must run an application that can withstand regional failures, handle large bursts in traffic, perform rolling updates safely, and maintain consistent low latency across continents. Containerization indicates the team prefers a Kubernetes-based approach. A multi-zone or multi-region cluster provides high availability and automatic failover, ensuring the system remains operational during outages. A regional Kubernetes cluster replicates control planes and nodes across multiple zones within a region, improving resilience against zone failures and ensuring rolling updates occur without downtime. It supports horizontal and vertical autoscaling, making it suitable for large user bases with dynamic load requirements. Global load balancing can route user traffic to the nearest cluster, reducing latency. The corporation needs fine-grained control over deployments, security policies, resource allocation, and networking. Kubernetes provides these capabilities along with native support for rolling updates and blue-green deployments, which are essential for maintaining uninterrupted service. A managed instance group on virtual machines is flexible but requires more maintenance. Engineers must handle OS patching, container runtimes, and scaling logic. It lacks the rich orchestration features of Kubernetes. A serverless container environment is highly scalable and easy to use, but it is not designed for complex, large-scale internal systems requiring fine-tuned control, service mesh integration, or advanced routing strategies. Furthermore, it does not offer built-in multi-region cluster architectures or multi-zone replication at the infrastructure level. A PaaS offering provides ease of use but lacks the container orchestration capabilities, custom control, and multi-zone cluster resilience required by massive enterprise systems. The corporation also needs integration with centralised IAM, service meshes, internal APIs, and custom compute configurations. Kubernetes networking policies allow secure and fine-grained workload communications. Additionally, GKE integrates with Cloud Logging, Monitoring, and Identity-Aware Proxy for secure internal access. For rolling updates, GKE provides controlled rollout strategies, enabling teams to reduce risk by deploying new versions gradually. Since logistics operations depend on highly available systems—such as package tracking, shipment scheduling, fleet dispatching, and warehouse management—the company must ensure consistent uptime. For global redundancy, they can deploy multiple regional clusters behind Cloud Load Balancing. This architecture allows failover between regions if one becomes unavailable. The combination of container flexibility, operational control, global load balancing, and zero-downtime rolling updates makes a regional Kubernetes cluster the correct choice.
Question 93:
A fintech startup needs a centralised logging solution to store, search, analyse, and alert on logs for compliance and auditing. They require log retention options, integration with SIEM tools, real-time monitoring, and low-latency search. Which service best meets these needs?
A) Cloud Logging
B) Cloud Monitoring
C) BigQuery
D) Cloud SQL
Answer: A)
Explanation:
A fintech startup handling sensitive financial transactions must maintain strict logging and auditing capabilities. A centralised logging platform must collect logs from application services, databases, networking components, and user activity. It must support security event detection, retention policies, and real-time analysis. A fully managed logging service supports ingestion from GCP services, custom applications, and external systems. It allows teams to write structured queries to filter logs, build dashboards, generate alerts, and detect unusual behaviour. Fintech environments often require compliance with regulations such as PCI DSS, which mandates secure storage, access control, and auditability of logs. A dedicated logging platform supports fine-grained IAM, enabling developers, auditors, and security analysts to access appropriate log sets. Retention policies can be configured to store logs for required compliance periods. Integration with SIEM tools allows centralised security analysis. The service can export logs to Pub/Sub, BigQuery, or Cloud Storage for extended analytics, archival, or machine learning. A monitoring platform focuses on metrics, uptime checks, and resource usage. It does not specialise in log storage or complex text search. While useful for performance monitoring, it cannot satisfy compliance requirements for log retention and auditing. A data warehouse provides excellent analytics capabilities but is not intended as a real-time logging engine. Ingesting high volumes of application logs directly would create cost inefficiencies, latency issues, and operational challenges. A relational database is unsuitable for large-scale logging due to limited throughput, storage costs, and the lack of real-time querying capabilities for log data. A centralised logging platform also supports alerting. For example, if unusual login activity occurs, alerts can be triggered through Monitoring or Pub/Sub. Fintech platforms must monitor for fraud indicators, unusual admin operations, or suspicious API calls. The service provides real-time log routing and supports log-based metrics, which can be used to trigger alerts. Combined, these features make it the ideal choice for secure, compliant logging and auditing.
Question 94:
A retail analytics company needs to run large-scale Spark and Hadoop jobs for batch processing. They want a managed service but still need full control over cluster configuration, custom libraries, and job-level tuning. Which GCP service should they use?
A) Dataflow
B) Dataproc
C) BigQuery
D) Cloud Functions
Answer: B)
Explanation:
A retail analytics company running large-scale Spark and Hadoop jobs must rely on a managed environment that still provides deep configurability. Their workloads likely include sales forecasting, customer segmentation, recommendation engines, clickstream analysis, and inventory optimisation. These tasks require processing huge datasets, applying machine learning models, and transforming data in multiple stages. They also require fine-tuned control over cluster settings such as executor memory, CPU allocation, YARN or standalone configurations, custom JARs, and dependency management. A managed Hadoop and Spark environment that automates cluster creation, scaling, and tear-down while still allowing full access to the underlying nodes is ideal. This service enables engineers to bootstrap clusters with custom initialisation scripts, install additional libraries, and adjust cluster topology to match performance requirements. Jobs can run on ephemeral clusters that auto-delete after completion, optimising cost and operational efficiency. This is important when dealing with workloads that occur on a scheduled cadence, such as nightly sales aggregations or monthly forecasting models.
Batch processing frameworks like Spark require fine control over partitioning, shuffle behaviour, and memory tuning. The retail analytics team can modify these parameters directly on the cluster. They can integrate with Cloud Storage for durable input and output datasets, enabling large-scale ETL processing without managing HDFS storage overhead. Additionally, this cluster-based solution integrates naturally with Jupyter notebooks, job workflow tools, Hive, Pig, Presto, and other ecosystem components. Data scientists and engineers benefit from flexible tooling and familiar open-source frameworks.
A serverless Beam-based ETL framework is ideal for unified batch and streaming pipelines, but is not intended for jobs dependent on specific Spark/Hadoop features or requiring deep cluster-level configuration. While it provides powerful data processing capabilities, teams cannot install arbitrary libraries at the cluster level or tune Spark executors. Retail analytics teams often rely on advanced Spark MLlib features or custom machine learning libraries that require native code execution on the cluster.
A serverless data warehouse excels at SQL-based analytics but is not optimised for running Spark or Hadoop workloads directly. Although it integrates with Spark through connectors, it does not serve as a Hadoop runtime environment. Retail organisations wanting to port existing Spark pipelines must retain the ability to run Spark jobs natively.
A serverless function is designed for lightweight, event-driven tasks. It cannot support long-running Spark jobs or large-scale distributed data processing. Functions also lack control over distributed computing parameters such as worker count, executor memory, and cluster-level execution strategies.
Because the analytics company needs both managed operations and full control over cluster internals, the managed Hadoop and Spark service is the correct choice. It provides operational efficiency, flexibility, compatibility with legacy pipelines, and excellent integration with the broader GCP ecosystem.
Question 95:
A cybersecurity firm needs to build an event ingestion pipeline for billions of security logs per day. They need a globally distributed messaging service with low latency, high throughput, and guaranteed message delivery. Which GCP service should they use?
A) Pub/Sub
B) Cloud Tasks
C) Cloud Storage
D) Memorystore
Answer: A)
Explanation:
A cybersecurity firm dealing with massive security logs requires a messaging system capable of handling global ingestion with high throughput and low latency. Security logs from firewalls, IDS/IPS systems, endpoint agents, cloud workloads, and network devices are often emitted continuously and can spike dramatically during cyberattacks. A messaging service capable of automatically scaling to millions of messages per second ensures that ingestion remains stable even during traffic surges. Global availability ensures logs can be collected from datacenters, branch offices, and cloud environments around the world.
A publish-subscribe messaging platform supports guaranteed at-least-once delivery and integrates seamlessly with downstream processors such as Dataflow, BigQuery, SIEMs, or ML-driven threat detection systems. It provides durable message storage and ensures that logs are processed even if downstream systems experience outages. Security monitoring systems require ordered processing within partitions, and the platform provides ordered keys when needed. It also supports dead letter topics to handle malformed messages, which is useful in cybersecurity workloads where malformed logs must still be investigated.
A task management service is designed for asynchronous task execution, not high-throughput log ingestion. It is limited by rate controls and is optimised for background job queues, not real-time streaming of billions of events.
Object storage supports durable storage of logs, but is not a messaging or event ingestion system. It does not provide low-latency message delivery, ordered processing, or push/pull subscription mechanisms required for real-time cybersecurity analysis.
An in-memory cache is suitable for low-latency lookups or caching threat intelligence datasets, but it is not capable of handling sustained ingestion workloads or ensuring message durability.
The messaging platform also integrates with VPC Service Controls, IAM policies, and encryption features essential for securing sensitive cybersecurity logs. Combined with Dataflow, the firm can build real-time threat detection pipelines leveraging windowed aggregations, anomaly detection, and ML model inference. With BigQuery, the firm can perform retrospective analysis to investigate breaches or correlate multi-day attack patterns. The service’s global availability, elasticity, durability, and ordering capabilities make it the most suitable choice for a cybersecurity logging pipeline.
Question 96:
A mobile gaming company wants to implement a user personalisation engine that requires real-time user profiles, session data, and leaderboards. They need a NoSQL database with millisecond latency and horizontal scaling. Which database is the best fit?
A) Cloud SQL
B) Bigtable
C) Firestore
D) Cloud Spanner
Answer: C)
Explanation:
A mobile gaming company seeking to build a real-time personalisation engine must support low-latency access to user profiles, session states, friends lists, and leaderboard records. These types of workloads often involve semi-structured data with rapidly evolving schemas. A NoSQL document database with automatic scaling, strong consistency at the document level, and millisecond reads suits these requirements well. It enables developers to store user data flexibly, update session context quickly, and serve personalisation decisions instantly to players around the world.
Gaming workloads also require synchronisation across user devices—for example, when updating inventory, player stats, in-game currency, or session progress. A NoSQL database with built-in real-time listeners enables updates to propagate seamlessly, ensuring consistent user experiences. Developers benefit from simple APIs and the ability to handle nested documents without rigid schemas. This allows them to quickly adjust game logic without extensive schema migrations.
A relational SQL database provides strong consistency and ACID transactions, but cannot horizontally scale to millions of concurrent users in real-time gaming scenarios. It is best suited for transactional or structured workloads, but becomes a bottleneck under high read/write volume.
A wide-column NoSQL database excels at time-series and analytics workloads with massive throughput. However, it is not optimised for hierarchical user documents or synchronous read/write patterns common in gaming profiles. It works best for telemetry, metrics, or large analytical datasets, not per-user dynamic profile data.
A globally distributed relational database offers strong consistency and global availability. While it is extremely powerful for transactional workloads, its cost and architectural design are not ideal for rapidly changing user documents or nested structures typical in gaming data. It is also more complex than necessary for per-user state storage.
The document-oriented NoSQL backend provides a strong balance between ease of development, low latency, flexible schema design, and large-scale reliability. It integrates seamlessly with serverless backends, mobile SDKs, and authentication systems to support player identity management. Real-time updates make it perfect for leaderboards that must reflect changing scores instantly. Its autoscaling capabilities ensure that sudden influxes of players—such as during promotional events or game launches—do not overwhelm the system. Therefore, this NoSQL document database is the best solution.
Question 97:
Your company is deploying a machine learning pipeline that needs to train large-scale TensorFlow models on GPUs. The team requires automatic hyperparameter tuning, managed training infrastructure, and the ability to deploy models as online prediction endpoints. Which GCP service provides all these capabilities?
A) Cloud Run
B) AI Platform (Vertex AI)
C) Compute Engine
D) Cloud Functions
Answer: B)
Explanation:
A machine learning pipeline requiring GPU-accelerated training, large-scale distributed model execution, automatic hyperparameter tuning, and the ability to deploy prediction endpoints needs a platform specifically designed for the full ML lifecycle. Training TensorFlow models at scale requires optimised infrastructure that supports GPU and TPU hardware, distributed training strategies, and efficient handling of large datasets. A managed ML platform streamlines these tasks by abstracting away cluster management, hardware provisioning, model versioning, artefact tracking, and endpoint deployment. Engineers and data scientists can train models using built-in hyperparameter tuning capabilities, experiment tracking tools, and integration with scalable data sources. The service provides automated pipelines that orchestrate preprocessing, training, evaluation, and deployment steps. Its built-in features support batch predictions, online real-time inference, and monitoring for model drift or performance degradation.
A serverless workload runner is designed for stateless microservices, containerised APIs, and web backends. While it can host small inference models, it does not support large-scale distributed model training or GPU/TPU allocations. It also lacks features such as managed hyperparameter tuning, dataset versioning, and built-in model monitoring. It is appropriate for hosting lightweight prediction endpoints but not for full-fledged ML pipeline management.
A general VM-based solution can technically run TensorFlow workloads because GPUs can be attached to virtual machines. However, it requires manual management of the entire ML infrastructure. Teams must install drivers, configure distributed training clusters, handle scaling, manage logging, and build their own job scheduling systems. It does not include built-in hyperparameter tuning, pipeline automation, or integrated model deployment. It becomes overly complex for organisations wanting an end-to-end ML lifecycle solution.
A serverless function execution environment is optimised for short-lived processes such as event handlers, lightweight transformations, or API logic. It does not support GPUs, long-running training jobs, or large-scale data ingestion for ML workloads. It also cannot deploy persistent prediction endpoints with autoscaling and traffic splitting. Therefore, it is unsuitable for ML training or enterprise-grade model serving.
The managed ML platform solves all these challenges by providing a unified environment for preprocessing, training, tuning, and deployment. It integrates with tools such as Feature Store, Workbench notebooks, AutoML, and Pipelines. It supports custom training jobs using Docker images, enabling advanced TensorFlow, PyTorch, JAX, or Scikit-Learn workflows. It offers GPU and TPU hardware with automatic scaling. It allows model deployment as fully managed endpoints with request-based autoscaling. Model monitoring features detect drift, skew, and data quality issues. Artefact tracking ensures reproducibility, while metadata stores help auditors and engineers review model lineage. Hyperparameter tuning uses Bayesian search or grid strategies to optimise model performance. The service also integrates with Dataflow and BigQuery for data preparation and supports continuous training pipelines. With all these features built into a managed environment, it clearly satisfies all the needs of the ML team and is the correct choice for large-scale TensorFlow training workloads.
Question 98:
A logistics company needs to optimise delivery routes in near real time. Their solution requires processing thousands of incoming updates per second from delivery vehicles, performing stream processing, and storing aggregated results for quick lookup. Which architecture best fits their needs?
A) Pub/Sub → Dataflow → Bigtable
B) Cloud SQL → Cloud Functions → BigQuery
C) Cloud Storage → Dataproc → Cloud SQL
D) Memorystore → Compute Engine → Cloud Storage
Answer: A)
Explanation:
A logistics company dealing with thousands of updates per second from delivery vehicles requires a highly scalable, low-latency, real-time data processing architecture. Delivery vehicles continuously send updates such as GPS coordinates, speed, traffic conditions, fuel status, stop confirmations, and environmental sensor readings. These updates must be ingested reliably and processed in real time to compute accurate ETAs, optimise routes dynamically, and provide visibility to dispatch teams. A messaging layer that can handle global event ingestion at a massive scale is ideal for collecting updates from vehicles across wide geographic regions. This messaging system ensures reliable delivery, supports ordering when necessary, and integrates smoothly with stream processing systems.
A fully managed streaming engine processes these updates using windowing, stateful transformations, and aggregations. It allows the logistics company to compute rolling metrics, detect delays, re-route drivers based on traffic, and update downstream systems. The system supports exactly-once processing and integrates with the messaging layer to create a seamless streaming pipeline. With proper tuning, the stream processor handles complex event patterns, sudden spikes in load, and real-time enrichment using external datasets. It also enables dynamic computation of metrics such as average speeds, distance remaining, traffic congestion patterns, and expected arrival times.
A NoSQL wide-column database provides extremely low-latency reads and writes, horizontal scaling, and high throughput. The aggregated and enriched results from the stream processor can be stored in this database for rapid lookup. Dispatch systems, mobile apps, and dashboards can query it in milliseconds to obtain the latest delivery statuses, locations, or predictions. Wide-column databases are particularly well-suited for time-series, IoT, and telemetry workloads, making them ideal for storing rapid vehicle updates.
A relational database paired with a serverless function cannot handle thousands of updates per second with low latency. It is likely to bottleneck under high write loads, and it lacks the real-time streaming capabilities needed for complex route optimisations. A storage-based Hadoop system is suited for batch processing, not real-time streaming. It introduces delays and lacks the low-latency requirements of route optimisation. An in-memory cache combined with virtual machines is useful for caching and microservices, but it cannot support the ingestion, transformation, and long-term storage required for a complete analytics pipeline.
The combination of a global ingestion system, a managed stream processing engine, and a scalable NoSQL store allows the logistics company to build a highly responsive, real-time pipeline capable of optimising thousands of deliveries simultaneously.
Question 99:
A healthcare company needs to store sensitive patient records and must ensure compliance with HIPAA. They want a relational database with automatic encryption, IAM integration, automated backups, point-in-time recovery, and minimal operational overhead. Which database should they choose?
A) Cloud Spanner
B) Cloud SQL
C) BigQuery
D) Firestore
Answer: B)
Explanation:
A healthcare organisation storing sensitive patient records must adhere to strict regulatory standards such as HIPAA, which require secure storage, auditing, encryption, restricted access controls, and data durability. They need a relational database because healthcare systems traditionally rely on structured schemas to represent patients, appointments, clinical notes, billing records, medication lists, and lab results. A relational engine provides ACID compliance, strong transactional guarantees, and referential integrity. A managed relational database service with automatic patching, monitoring, backups, and security controls dramatically reduces operational overhead while maintaining compliance.
A globally distributed relational database is extremely powerful but is generally used for applications requiring global consistency and horizontal scalability beyond regional limits. While it supports strong consistency and automatic encryption, it is more complex to operate and more expensive than necessary for typical healthcare applications. Most healthcare workloads do not require global distribution or unlimited horizontal scaling. Additionally, integration with third-party EHR systems is often easier with traditional relational models rather than globally partitioned systems.
A serverless analytical data warehouse is optimised for large-scale analytics rather than transactional patient record storage. While it supports encryption and regulatory compliance, it is not designed as a primary OLTP database. It uses columnar storage, which is not optimal for frequent granular inserts, updates, and small transactions common in patient record systems. It is ideal for population-level analytics but not for storing live patient data.
A document-based NoSQL database supports flexible schemas and real-time updates, but does not provide the relational consistency guarantees required for healthcare data. Healthcare applications often depend on complex joins, transactional updates, and strict data integrity constraints. Additionally, developers may face challenges in ensuring compliance workflows and maintaining transactional consistency in deeply nested documents.
The managed relational database service offers encryption at rest and in transit, automated backups with point-in-time recovery, and seamless IAM integration. It also supports private IP connectivity, VPC Service Controls, auditing, and maintenance windows. It reduces operational complexity because patching, failover, monitoring, and replication are handled automatically. Healthcare companies benefit from read replicas, high availability configurations, and automated failover. The service meets HIPAA regulatory requirements when used within a properly configured environment and a signed Business Associate Agreement (BAA). Its familiarity, compatibility with standard SQL engines, and robust tooling make it the ideal choice for patient record systems requiring compliance, reliability, and strong security controls.
Question 100:
A financial trading platform needs ultra-low latency reads and writes for time-series market data. They must store tick-level updates, support extremely high throughput, and scale horizontally with predictable performance. Which GCP database should they use?
A) Cloud SQL
B) Bigtable
C) Firestore
D) Cloud Spanner
Answer: B)
Explanation:
A financial trading platform requires a database that can ingest and retrieve massive volumes of tick-level market data with extremely low and predictable latency. Market data arrives at sub-second intervals from stock exchanges, brokers, and trading engines, representing trades, bids, asks, spreads, depth-of-book updates, and price fluctuations. Such workloads involve time-series patterns with append-heavy writes and rapid sequential reads. A horizontally scalable NoSQL wide-column database with predictable performance is ideal for this scenario. Its architecture provides high throughput at low latency, enabling billions of rows to be stored efficiently. It also allows wide tables with multiple column families, making it suitable for partitioning data by symbol, timestamp, or region. Its HBase-compatible API and large cluster capability let financial analytics teams run complex workloads without worrying about scaling limits.
A relational SQL database cannot support the ingestion rate required for tick-level market feeds. Financial data streams can exceed hundreds of thousands of updates per second. Relational engines also struggle with ultra-low latency read/write patterns at this scale. While they provide strong transactional guarantees, they cannot keep up with the throughput required for real-time trading analytics.
A document database is optimised for flexible hierarchical documents, not sequential time-series data requiring microsecond-level access patterns. Though it can scale for user profiles or application metadata, it is not appropriate for continuous append-heavy ingestion of market data. Its consistency model and indexing approach introduce unnecessary overhead for workloads requiring predictable high-volume writes.
A globally distributed relational system provides excellent consistency and availability but suffers from higher latency than required for ultra-low latency analytics. Global synchronisation across replicas introduces overhead that is unsuitable for financial trading workloads needing deterministic performance. Additionally, this database is optimised for transactional workloads rather than high-throughput time-series inserts.
The wide-column NoSQL platform supports petabyte-scale datasets, time-series modelling, and high-throughput ingestion. Rows can be keyed using instrument symbols combined with timestamps, enabling efficient scanning for recent updates. Trading teams can compute real-time indicators, market trends, and volatility metrics efficiently. It integrates seamlessly with Dataflow for streaming ingestion, Pub/Sub for market event pipelines, and BigQuery for analytics. The database supports replication, cluster resizing, and high availability without sacrificing performance. Its consistent latency ensures analytics dashboards, trading engines, and alerting systems can access market updates instantly. For use cases like tick-level data feeds, order book snapshots, or historical replay systems, this database remains the industry-aligned choice. Therefore, it is the correct database for ultra-low latency financial market workloads.
Question 101:
A biotech research team needs to run large-scale genomics pipelines using containers. Their workflows must process terabytes of sequencing data, run on preemptible VMs to reduce cost, and orchestrate tasks across many compute nodes. Which GCP service best supports this?
A) Cloud Run
B) Cloud Batch
C) Cloud Functions
D) App Engine
Answer: B)
Explanation:
A biotech research team performing large-scale genomics analysis needs a batch processing service capable of orchestrating containerised workloads across many compute nodes. These workloads often involve alignment, variant calling, normalisation, and quality control steps, each requiring CPU-intensive processing and sometimes GPU acceleration. Sequencing datasets can reach terabytes in size, especially when analysing whole genomes or running population-scale studies. A managed batch service is ideal because it provides controlled orchestration of jobs, container-based execution, integration with object storage, and efficient scheduling. It supports running on preemptible VMs, which reduces costs significantly for long-running compute-intensive tasks that can tolerate interruptions. The batch system handles job retries, dependency graphs, parallel task scheduling, and resource scaling automatically.
A serverless container execution environment is designed for stateless microservices, APIs, and lightweight data processing. It does not support long-running batch jobs or workflows requiring orchestration of thousands of tasks. It also cannot use preemptible VMs or provide fine-grained scheduling across compute nodes.
A serverless function platform is ideal for lightweight, event-driven operations such as file processing triggers or notification logic. It cannot handle long-running genomics workloads, large container images, or distributed job orchestration. It lacks features like retries based on exit codes, job arrays, and workflow DAGs.
A managed application platform supports web applications and backend services, but not containerised batch pipelines. It does not support preemptible compute resources, distributed job execution, or large-scale parallel processing.
The managed batch service integrates with Cloud Storage for storing sequencing reads, intermediate files, and final results. It allows the biotech team to define job dimensions, allocate CPUs, GPUs, or memory, and schedule thousands of tasks simultaneously. The service automatically balances jobs across compute clusters and ensures efficient utilisation. It also supports container images hosted in Artefact Registry, allowing researchers to package genomics tools like GATK, BWA, Samtools, DeepVariant, or custom ML pipelines into portable environments. Retries are automatic when using preemptible VMs, ensuring robust execution despite occasional interruptions. Scalable parallelisation allows pipelines to process large cohorts efficiently. The service’s auditing, logging, and monitoring features help validate results and ensure reproducibility. For genomics workloads requiring HPC-like orchestration, this batch platform is the correct solution.
Question 102:
A streaming media company wants to store user viewing history, session metadata, and personalised recommendations. They need a globally distributed, strongly consistent relational database to ensure users see the same profile data no matter where they log in. Which service is the best fit?
A) Cloud SQL
B) Firestore
C) Bigtable
D) Cloud Spanner
Answer: D)
Explanation:
A streaming media company handling user viewing histories, session data, and personalised recommendations across multiple regions requires a globally consistent relational database. Users may access the platform from different countries, devices, or networks, and they expect their watchlists, preferences, and session states to remain consistent. A globally distributed database with strong consistency ensures updates propagate across regions without stale reads. This database supports high write throughput, automatic sharding, and synchronous replication across regions. Its relational model supports complex queries, joins, and structured relationships such as linking users to profiles, watchlists to content catalogues, and metadata to playback history.
A regional SQL database provides strong consistency within a single region but cannot replicate globally with the latency guarantees required by streaming platforms. It cannot also scale horizontally across regions. For global customers, this leads to latency spikes and inconsistency when accessing data from other continents.
A document database is useful for application metadata and nested structures, but does not provide multi-region strong consistency for relational workloads. While it scales globally, its consistency model varies and is not designed for strict relational integrity requirements. Personalised recommendations often involve joins or transactions that must remain consistent across all user devices.
A wide-column database is excellent for time-series or analytical data, but does not support relational integrity or globally consistent transactions. It works best for telemetry, logs, and high-throughput ingestion, not user profile storage requiring strong cross-region consistency.
The globally distributed relational database solves these challenges by providing horizontal scalability, strongly consistent reads and writes, SQL support, and automated failover. It partitions data automatically, handles multi-region replication, and enables low-latency access for global users. Streaming platforms can store watch histories, recommendation logs, and playback metadata with guaranteed consistency across regions. This ensures users receive the same experience everywhere while enabling real-time personalisation systems to operate reliably. Therefore, this database is the most suitable choice.
Question 103:
A fintech startup wants to detect fraudulent transactions in real time. They need to process millions of events per second from payment systems, apply custom rules and machine learning models, and store aggregated results for analytics. Which architecture should they use?
A) Pub/Sub → Dataflow → BigQuery
B) Cloud SQL → Cloud Functions → BigQuery
C) Cloud Storage → Dataproc → Cloud SQL
D) Memorystore → Compute Engine → Cloud Storage
Answer: A)
Explanation:
A fintech startup detecting fraud in real time needs a highly scalable and low-latency architecture. Fraud detection involves ingesting millions of events per second from payment gateways, transaction processors, or mobile apps. Each event must be validated, enriched, evaluated against rule-based systems, and possibly fed to ML models for anomaly scoring. A messaging layer capable of massive parallel ingestion ensures events are received reliably and ordered when necessary. A global, fully managed publish-subscribe system provides these features with strong delivery guarantees, horizontal scaling, and integration with downstream systems.
A managed stream processing engine transforms and enriches data in real time. It handles windowing, joins, and stateful computations, enabling aggregation of metrics, detection of anomalies, and calculation of risk scores. It supports fault tolerance, exactly-once processing semantics, and automatic scaling, essential for handling unpredictable spikes in transaction volume. By using such a service, the fintech team avoids managing clusters, scaling infrastructure manually, or implementing complex distributed processing logic themselves. The streaming engine also integrates with ML model inference, enabling risk-based scoring or anomaly detection in real time.
A scalable analytical store enables the startup to store aggregated results for reporting, regulatory compliance, or retrospective analysis. This store supports fast SQL-based queries across large datasets, enabling fraud analysts to explore trends, detect patterns, and fine-tune detection models. Analytics may include rolling sums, counts of suspicious events per user, geographic patterns, or velocity-based fraud detection.
A relational database and serverless function architecture lacks the throughput to handle millions of events per second. Functions are ideal for event-triggered micro-tasks, but cannot perform large-scale streaming transformations reliably. Batch processing systems like Hadoop or Dataproc are designed for offline jobs and cannot deliver low-latency insights required for real-time fraud mitigation. An in-memory cache system is excellent for fast lookups or ephemeral computations, but cannot handle persistent storage or streaming pipelines at this scale.
Using this combination, the startup achieves end-to-end streaming ingestion, low-latency transformation and ML scoring, and durable analytics storage. The architecture can detect fraudulent transactions in real time, trigger alerts, and feed models to improve detection accuracy continuously. The system also provides compliance auditing and integration with visualisation dashboards. Therefore, the best solution is the messaging → stream processing → analytical store pipeline.
Question 104:
A healthcare research institute wants to analyse genomic data across multiple projects. They require a serverless, petabyte-scale data warehouse with SQL support, automatic scaling, and no infrastructure management. Which service should they choose?
A) BigQuery
B) Cloud SQL
C) Dataproc
D) Firestore
Answer: A)
Explanation:
A healthcare research institute working with genomics requires a data platform capable of analysing vast datasets efficiently. Genomic data is extremely large, often measured in terabytes per experiment, and must be queried alongside other clinical and experimental datasets. A serverless, petabyte-scale analytical platform enables scientists to run complex SQL queries on structured or semi-structured genomic data without provisioning clusters or managing hardware. The platform automatically scales compute resources to handle workloads of varying intensity, including large-scale joins, aggregations, or statistical computations.
A traditional relational SQL database is not suitable for petabyte-scale genomics analysis. While it ensures ACID compliance and supports transactional queries, it cannot handle extremely large datasets efficiently or scale dynamically for massive batch queries. Performance can degrade as the dataset size grows, requiring costly sharding, indexing, or manual tuning.
A cluster-based Hadoop/Spark solution can process large data, but requires infrastructure management, cluster scaling, and job scheduling. Researchers would need to manage compute nodes, handle failures, tune Spark jobs, and ensure job retries. While it provides flexibility, operational complexity is high and can slow the pace of research.
A NoSQL document database is optimised for flexible hierarchical data but is not designed for complex analytical queries or large-scale joins required for genomic research. It is better suited for transactional workloads, content storage, or real-time document access rather than analytical genomics pipelines.
The serverless analytical warehouse integrates with cloud storage, allowing researchers to query raw genomic files directly without moving data. It supports federated queries, machine learning integration, and visualisation tools. Scientists can run genome-wide association studies, variant analysis, or population-level aggregation with high performance. The system also supports cost-efficient storage separation, where compute and storage are decoupled, so large datasets do not incur unnecessary compute charges when not being queried. Security, encryption, and audit logging ensure compliance with regulatory standards for sensitive genomic data. Automatic caching, columnar storage, and query optimisation further improve performance. The platform reduces operational overhead, enabling researchers to focus on scientific analysis rather than infrastructure. In summary, the serverless analytical warehouse meets all requirements for genomics research: SQL support, automatic scaling, high performance, and zero infrastructure management.
Question 105:
A gaming company wants to store real-time player scores, achievements, and session data. They need a highly scalable NoSQL database with millisecond latency, automatic scaling, and strong consistency for critical user data. Which database should they use?
A) Cloud SQL
B) Firestore
C) Bigtable
D) Cloud Spanner
Answer: B)
Explanation:
A gaming company storing player scores, session states, and achievements requires a database that can provide extremely low-latency access for millions of concurrent users. The database must scale automatically to accommodate sudden spikes, such as during game launches or competitive events. Strong consistency ensures that users see the latest scores, progress, or rewards without conflicts, providing a reliable and fair gameplay experience. A document-oriented NoSQL database offers a flexible schema suitable for hierarchical user data, such as nested achievements, inventory, or session attributes. It also supports millisecond reads and writes, enabling real-time updates across multiple devices and platforms.
A relational SQL database provides strong consistency but struggles to scale to millions of concurrent players without complex sharding, replication, or load-balancing strategies. It is also less suitable for hierarchical or semi-structured data typical of modern gaming systems.
A wide-column NoSQL database excels in high-throughput time-series or telemetry data, but lacks built-in document hierarchy, and strong transactional consistency across small granular units can be more complex to implement. It is better suited for logging or analytical pipelines rather than individual user-centric transactional data.
A globally distributed relational database provides strong consistency and horizontal scaling, but operational overhead and cost may be high for per-user session data that can change rapidly. It may also be more complex to integrate with mobile SDKs and real-time listeners required for live multiplayer games.
The document-oriented NoSQL solution provides automatic scaling, strong consistency, offline support for mobile devices, and real-time updates through listeners. This ensures leaderboards, achievements, and session data remain synchronised across devices, regions, and platforms. Built-in integration with authentication and security controls allows developers to protect sensitive user data. Automatic replication ensures availability even during regional outages, and its flexible schema accommodates evolving game features without complex migrations. This combination makes it the best choice for real-time gaming user data storage.