Google Professional Machine Learning Engineer Exam Dumps and Practice Test Questions Set 2 Q16-30
Visit here for our full Google Professional Machine Learning Engineer exam dumps and practice test questions.
Question 16
You are designing a machine learning system to predict equipment failure in a factory using IoT sensor data. Sensor readings are collected every second, producing high-dimensional time series data. Which approach is most suitable for modeling this data?
A) Use a feedforward neural network on raw sensor readings
B) Apply a recurrent neural network (RNN) or LSTM
C) Aggregate sensor data and use linear regression
D) Randomly sample a subset of sensors and use k-means clustering
Answer: B
Explanation:
Using a feedforward neural network on raw sensor readings can capture complex nonlinear relationships, but it is not ideal for sequential or temporal data. Feedforward networks treat each input independently, ignoring the temporal dependencies present in time series sensor data. Since IoT sensors generate readings over time, crucial patterns, trends, or correlations across time steps may be lost. As a result, the model may fail to recognize temporal signals indicative of impending failures, limiting its predictive accuracy.
Applying a recurrent neural network (RNN) or long short-term memory (LSTM) network is well-suited for high-dimensional time series data. RNNs are designed to process sequential information, maintaining internal states that capture temporal dependencies. LSTMs enhance RNNs by incorporating gating mechanisms that prevent the vanishing gradient problem, allowing the network to learn long-term dependencies effectively. In the context of predictive maintenance, patterns leading up to equipment failure may span minutes or hours, and LSTMs can capture these complex sequences. By modeling the temporal structure of sensor readings, RNNs or LSTMs can detect subtle changes, trends, or anomalies, improving early warning accuracy.
Aggregating sensor data and using linear regression simplifies the problem by reducing dimensionality and temporal resolution. While averaging or summarizing readings may reduce noise and computational burden, it discards sequential information that is often critical for failure prediction. Linear regression assumes a linear relationship and cannot capture complex temporal patterns, making it unsuitable for high-frequency, multi-sensor data where nonlinear interactions and trends over time are essential.
Randomly sampling a subset of sensors and using k-means clustering is primarily an unsupervised approach suitable for detecting groups or patterns in data. While clustering may help identify typical operating states, it does not provide predictive capabilities for failure events. Randomly selecting sensors can also ignore important features, further reducing model effectiveness. Clustering alone cannot leverage the temporal information needed to forecast future failures.
Recurrent neural networks, particularly LSTMs, are the most appropriate approach because they explicitly model sequential dependencies and can learn complex temporal patterns in high-dimensional sensor data. This allows the system to capture subtle precursors to failures, maintain robustness across variable time scales, and provide actionable predictions for proactive maintenance in industrial environments.
Question 17
You are training a large transformer-based language model. The model overfits quickly on the training data despite having access to a large corpus. Which strategy is most effective for improving generalization?
A) Increase the batch size and remove regularization
B) Introduce dropout and use weight decay
C) Remove early stopping and train longer
D) Reduce the size of the dataset
Answer: B
Explanation:
Increasing the batch size and removing regularization may stabilize training and speed up convergence, but they do not address overfitting. Without regularization, the model has excessive capacity and may memorize training examples instead of learning generalizable patterns. Large batch sizes reduce gradient noise but also can lead to poorer generalization in transformer models, particularly if no techniques are applied to counteract overfitting. Training on the large corpus without constraints can still result in memorization rather than robust language understanding.
Introducing dropout and using weight decay is an effective strategy to improve generalization. Dropout randomly deactivates neurons during training, forcing the model to learn redundant representations and reducing reliance on specific paths. Weight decay adds a penalty to large parameter values, discouraging extreme weights and preventing overfitting. Together, these techniques improve robustness, maintain performance on unseen text, and mitigate the risk of memorization, which is particularly important in large transformer architectures with millions of parameters. Dropout and weight decay are standard regularization practices in modern NLP training pipelines.
Removing early stopping and training longer exacerbates overfitting. Early stopping monitors validation performance and halts training when improvements plateau, preventing the model from over-optimizing on the training set. Eliminating this safeguard allows the model to fit noise in the training data, degrading performance on validation or test sets. Extended training without constraints in large models with high capacity amplifies memorization and overfitting, which is counterproductive to generalization.
Reducing the size of the dataset may simplify training, but it is counterintuitive when overfitting is the problem. Smaller datasets provide less information and increase the risk of memorization, especially in large transformer models. Overfitting is more likely to occur on smaller datasets because the model has enough capacity to memorize every training example. Instead, leveraging the full dataset with regularization techniques is more effective for improving generalization.
Introducing dropout and weight decay is the most effective approach because it directly addresses the overfitting issue. It allows the transformer to leverage its large capacity while learning robust patterns that generalize to unseen data. These techniques are widely used in production-scale NLP models to improve performance, prevent memorization, and ensure stable, reliable language understanding across diverse inputs.
Question 18
You are building a computer vision system for defect detection on a manufacturing assembly line. Images are high-resolution, and defects occupy small regions of the image. Which modeling approach is most suitable?
A) Use a standard convolutional neural network on downsampled images
B) Apply a region-based CNN (R-CNN) or object detection framework
C) Use a fully connected network on raw pixel values
D) Reduce image resolution and apply a simple linear classifier
Answer: B
Explanation:
Using a standard convolutional neural network on downsampled images may reduce computational requirements, but it risks losing critical information about small defects. Downsampling can blur or eliminate fine details, making it difficult for the model to detect minor defects accurately. While CNNs are effective for general image classification, their performance suffers when small, localized features are critical, as is the case with defect detection on high-resolution images.
Applying a region-based CNN (R-CNN) or object detection framework is highly appropriate for this task. R-CNN and its variants, such as Faster R-CNN or Mask R-CNN, are designed to detect and localize objects within images. They use region proposal networks to focus on specific areas likely to contain relevant features, allowing the model to identify small defects without downsampling the entire image. Object detection frameworks combine localization with classification, making them ideal for scenarios where defects occupy only small portions of the image. These models can learn hierarchical features from high-resolution data, maintain spatial precision, and provide bounding boxes for detected defects, supporting actionable quality control decisions.
Using a fully connected network on raw pixel values is impractical for high-resolution images. Fully connected networks treat every pixel independently, requiring an enormous number of parameters and extensive computation. They also fail to exploit spatial relationships between pixels, which are crucial for detecting localized defects. This approach is computationally expensive and unlikely to converge effectively on complex visual patterns, making it unsuitable for defect detection.
Reducing image resolution and applying a simple linear classifier further loses critical information. Small defects may disappear entirely at lower resolutions, and linear classifiers cannot capture complex visual patterns or textures. This approach is too simplistic for industrial defect detection, where high accuracy and precise localization are necessary to maintain quality standards.
Region-based CNNs or object detection frameworks are the most suitable approach because they explicitly model spatial relationships, focus on small regions, and maintain the high-resolution details necessary for defect detection. By combining localization and classification, these models provide accurate, actionable predictions and are widely used in industrial computer vision applications for quality assurance.
Question 19
You are designing a predictive model for stock price movement. The data is highly noisy and non-stationary, with trends and sudden shocks. Which modeling approach is most appropriate?
A) Simple linear regression on historical prices
B) Recurrent neural networks (RNN) or LSTM with sliding windows
C) Use the last observed price as the prediction
D) Clustering past price movements without forecasting
Answer: B
Explanation:
Simple linear regression on historical prices assumes a linear relationship between input features and the target variable. In the context of stock prices, this assumption rarely holds because financial time series are highly volatile, nonlinear, and influenced by numerous unpredictable factors. Linear regression cannot capture complex temporal patterns, sudden shocks, or long-term dependencies. Additionally, using only past prices without modeling trends or seasonality leads to poor predictive performance. While linear regression is easy to implement and interpret, it is insufficient for high-frequency or complex stock prediction tasks.
Recurrent neural networks (RNN) or LSTM with sliding windows are well-suited for modeling non-stationary and noisy time series like stock prices. RNNs maintain internal memory of past states, allowing them to capture temporal dependencies. LSTMs further enhance this capability with gates that control the flow of information, enabling the model to retain long-term patterns while ignoring irrelevant noise. Sliding windows provide sequences of past observations as inputs, helping the model learn dependencies over specific time intervals. This approach allows the network to identify trends, momentum, and abrupt changes, improving forecasting capability despite the noise inherent in financial data.
Using the last observed price as the prediction is a naive baseline approach. While it may achieve minimal error in extremely short-term forecasts when markets are relatively stable, it fails to account for trends, seasonal effects, or sudden shocks. Reliance on the last price is inadequate for actionable trading strategies or long-term predictions because financial markets are dynamic and affected by multiple exogenous variables. This method essentially ignores the richness of historical data and temporal patterns that advanced models can exploit.
Clustering past price movements without forecasting identifies patterns or groups of similar sequences, but it does not provide a predictive framework. Clustering can be useful for exploratory analysis, identifying regimes, or grouping stocks with similar behavior, but it cannot generate future price predictions. Without combining clustering with forecasting or regression models, this approach does not solve the primary task of predicting stock price movements.
RNNs or LSTMs with sliding windows are the most appropriate approach because they explicitly model temporal dependencies and are robust to noise in sequential data. They can learn complex patterns in stock price movements, retain long-term dependencies, and provide more accurate forecasts than naive methods. This approach is widely used in algorithmic trading and financial time series prediction due to its ability to handle nonlinearity, non-stationarity, and sudden shocks effectively.
Question 20
You are developing a recommendation system for an online learning platform. Users have varying numbers of interactions, and new courses are added frequently. Which technique is most suitable for addressing cold-start issues?
A) Pure collaborative filtering based on user-item interactions
B) Content-based filtering using course features and user profiles
C) Recommend courses randomly
D) Recommend only the most popular courses
Answer: B
Explanation:
Pure collaborative filtering based on user-item interactions relies entirely on historical interactions to recommend items. While effective for users and courses with substantial interaction data, it struggles with the cold-start problem. New users have no interaction history, and new courses have no engagement data. Collaborative filtering cannot make meaningful recommendations in these scenarios because it depends on prior data to compute similarities between users or items. As a result, it fails to provide relevant suggestions for new users or courses, limiting its applicability in a dynamic online learning platform.
Content-based filtering using course features and user profiles is well-suited for cold-start scenarios. Content-based methods leverage metadata such as course topic, difficulty level, length, instructor expertise, and user preferences or profile information. By matching users with courses that align with their interests or skill levels, the system can provide personalized recommendations without requiring historical interaction data. This approach addresses both new users and new courses, ensuring that the platform delivers relevant content even in the absence of prior engagement. Content-based filtering can also complement collaborative methods in hybrid systems, providing more robust recommendations.
Recommending courses randomly provides exposure to items but does not tailor suggestions to user preferences. While it may occasionally introduce new courses to users, it fails to engage learners effectively, reduces user satisfaction, and is unlikely to drive meaningful learning outcomes. Random recommendations are inefficient for platforms aiming to maximize user engagement and retention.
Recommending only the most popular courses is a simplistic approach that biases recommendations toward items with high engagement. While it can provide some relevance for users, it ignores individual preferences and does not address the cold-start problem for new courses. Popularity-based recommendations may dominate the system, reducing diversity and limiting exposure to new or niche courses.
Content-based filtering is the most suitable technique for addressing cold-start issues because it relies on intrinsic features of items and users rather than historical interactions. It enables personalized recommendations for both new users and new courses, maintaining engagement and relevance. This approach ensures that the recommendation system remains effective in dynamic and rapidly evolving environments, supporting user learning and retention while providing meaningful exposure to new content.
Question 21
You are building a machine learning pipeline to classify medical images into multiple disease categories. The dataset is imbalanced, with some diseases much rarer than others. Which evaluation metric is most appropriate?
A) Accuracy
B) Macro-averaged F1 score
C) Mean squared error
D) Root mean squared error
Answer: B
Explanation:
Accuracy measures the proportion of correct predictions out of total predictions. While commonly used, accuracy is misleading for imbalanced datasets. In medical image classification, if the majority of images belong to common diseases, a model that predicts only the majority class could achieve high accuracy while failing to detect rare but critical diseases. This provides an inflated sense of performance and overlooks the importance of identifying minority classes, which may represent life-threatening conditions.
Macro-averaged F1 score is the harmonic mean of precision and recall calculated for each class and then averaged. This metric treats all classes equally, regardless of their frequency, ensuring that rare disease categories are given appropriate weight in evaluation. By balancing precision (correct positive predictions) and recall (ability to identify actual positive cases), the macro F1 score provides a more comprehensive view of model performance across all disease categories. It is particularly suitable in medical settings where correctly identifying rare conditions is critical for patient outcomes.
Mean squared error (MSE) is a regression metric that measures the average squared difference between predicted and actual values. It is not appropriate for classification problems because it does not evaluate the correctness of categorical predictions. Using MSE for multi-class classification can be misleading and fails to capture the model’s ability to detect minority classes, making it unsuitable for medical image classification tasks.
Root mean squared error (RMSE) is the square root of MSE and shares the same limitations. It is designed for continuous numeric targets rather than categorical outcomes. RMSE does not reflect class-level performance, sensitivity, or the ability to detect rare disease cases, and thus, it is not appropriate for imbalanced multi-class classification problems.
Macro-averaged F1 score is the most appropriate evaluation metric because it balances the importance of precision and recall across all classes. It ensures that the model’s performance on rare disease categories is not overshadowed by majority classes, providing a more accurate and clinically relevant assessment. This metric supports model selection, hyperparameter tuning, and validation in medical image classification pipelines, ensuring equitable performance and patient safety.
Question 22
You are designing a real-time object detection system for autonomous vehicles. The model must process high-resolution images at low latency. Which approach is most suitable?
A) Use a standard deep convolutional neural network for classification
B) Apply a lightweight, real-time object detection model like YOLO or SSD
C) Downsample images and apply k-means clustering
D) Use a fully connected neural network on raw pixels
Answer: B
Explanation:
Using a standard deep convolutional neural network (CNN) for classification is effective for identifying the presence of objects in an image, but it does not provide localization. Classification CNNs output a single label per image and cannot provide bounding boxes or coordinates for multiple objects, which is essential for autonomous driving. Moreover, standard CNN architectures designed for classification tend to be computationally heavy, making real-time inference on high-resolution images challenging due to latency constraints. This approach is insufficient for real-time object detection requirements in autonomous vehicles.
Applying a lightweight, real-time object detection model like YOLO (You Only Look Once) or SSD (Single Shot MultiBox Detector) is highly suitable for this scenario. These models are specifically designed to detect and localize multiple objects in an image simultaneously while maintaining low latency. YOLO divides the image into a grid and predicts bounding boxes and class probabilities in a single forward pass, enabling real-time performance. SSD also predicts objects at multiple scales with a single pass, providing high accuracy and speed. Both approaches are optimized for deployment in resource-constrained environments such as autonomous vehicles, allowing the system to react quickly to dynamic driving conditions while processing high-resolution images effectively.
Downsampling images and applying k-means clustering may reduce computational load, but sacrifice the critical spatial resolution needed to detect small or distant objects. Clustering identifies groups of similar pixels or regions but does not provide object localization, classification, or bounding boxes. This approach is inadequate for real-time decision-making in safety-critical applications where accurate object detection is essential. It is more suitable for exploratory analysis or segmentation rather than autonomous driving.
Using a fully connected neural network on raw pixels is computationally infeasible for high-resolution images. Fully connected networks require a separate weight for every pixel in the input layer, resulting in an enormous number of parameters. They also ignore spatial correlations in images, which are critical for object detection. This makes them highly inefficient and unsuitable for real-time applications, particularly when multiple objects need to be detected and localized in a dynamic environment.
Lightweight real-time detection models like YOLO or SSD are the most appropriate approach because they provide the best balance between speed and accuracy. They can detect multiple objects at high resolution with low latency, maintain spatial precision, and support the rapid decision-making required for autonomous driving. Their architecture is optimized to handle high-resolution inputs efficiently, making them practical for deployment in real-world scenarios where safety and performance are critical.
Question 23
You are building a model to predict hospital patient length of stay. The dataset contains numerical, categorical, and time-dependent features. Some features are missing for a subset of patients. Which preprocessing strategy is most appropriate?
A) Drop all records with missing data
B) Impute missing values using statistical methods or domain knowledge
C) Replace missing values with zeros and ignore feature types
D) Remove categorical and time-dependent features
Answer: B
Explanation:
Dropping all records with missing data is a straightforward approach, but it can significantly reduce the dataset size and bias the results. In medical datasets, missing values often occur non-randomly, such as missing lab tests for less severe patients. Removing these records could lead to a dataset that is unrepresentative of the patient population and reduce model generalization. Additionally, losing valuable records decreases statistical power, particularly if the dataset is not extremely large.
Imputing missing values using statistical methods or domain knowledge is a standard and effective strategy. Numerical missing values can be imputed using mean, median, or more sophisticated methods such as k-nearest neighbors or model-based imputations. Categorical missing values can be handled using the most frequent category, indicator variables, or probabilistic imputations. Domain knowledge is particularly valuable in healthcare, where missingness can carry meaningful information. For example, the absence of certain tests might indicate patient stability. Imputation ensures that all records are retained while maintaining the integrity of the dataset, allowing the model to learn effectively from complete feature representations.
Replacing missing values with zeros and ignoring feature types is generally inappropriate. Zero imputation introduces artificial bias, particularly for numerical features where zero is not a meaningful or plausible value. Ignoring feature types further compounds the problem, as categorical or time-dependent features may be misrepresented, resulting in poor model performance. This approach can distort relationships in the data and lead to unreliable predictions, especially in sensitive domains like healthcare.
Removing categorical and time-dependent features sacrifices valuable predictive information. Categorical features, such as diagnosis codes or procedure types, often carry crucial signals for predicting length of stay. Time-dependent features, such as time since admission or previous interventions, are important for modeling trends and trajectories in patient health. Removing these features simplifies the problem but significantly reduces predictive power and model interpretability.
Imputing missing values using statistical methods and domain knowledge is the most appropriate preprocessing strategy because it retains all available data while handling incomplete information intelligently. This approach preserves valuable signals from numerical, categorical, and temporal features, ensures that patterns are accurately represented, and allows the model to generalize effectively to unseen patients. It balances completeness with data integrity, which is essential for healthcare applications where patient outcomes depend on accurate predictions.
Question 24
You are building a fraud detection system for online transactions. Fraudulent transactions are extremely rare, and misclassifying them has a high cost. Which evaluation metric is most appropriate?
A) Accuracy
B) Precision, recall, and F1 score
C) Mean absolute error
D) Root mean squared error
Answer: B
Explanation:
Accuracy measures the proportion of correct predictions out of all predictions. While commonly used in balanced datasets, accuracy is misleading in highly imbalanced scenarios like fraud detection. If fraudulent transactions represent 1% of the dataset, a model that predicts all transactions as non-fraud will achieve 99% accuracy but fail at detecting actual fraud. Relying on accuracy alone can give a false sense of performance and does not capture the ability to identify the minority class, which is the primary goal in fraud detection.
Precision, recall, and F1 score are suitable metrics for imbalanced datasets. Precision measures the proportion of correctly identified fraud cases out of all predicted frauds, reflecting the model’s reliability in avoiding false positives. Recall measures the proportion of actual fraud cases detected, reflecting the model’s ability to capture rare events. The F1 score combines precision and recall harmoniously, balancing the trade-off between detecting fraud and minimizing false alarms. These metrics provide a comprehensive view of model performance and are critical when the cost of misclassification is high, such as financial losses or regulatory violations.
Mean absolute error (MAE) is a regression metric that measures the average absolute difference between predicted and actual values. It is not suitable for classification problems like fraud detection, as it does not reflect the model’s ability to correctly identify rare fraudulent transactions or handle class imbalance. MAE evaluates numeric deviations, which are irrelevant for categorical prediction tasks.
Root mean squared error (RMSE) is also a regression metric and shares the same limitations as MAE. While RMSE penalizes larger errors more heavily, it is not informative for classification, particularly in imbalanced datasets. RMSE does not differentiate between false positives and false negatives, which is crucial in fraud detection, where the impact of errors varies depending on the class.
Precision, recall, and F1 score are the most appropriate evaluation metrics for fraud detection because they focus on the minority class, which is of primary concern. They provide actionable insights into the trade-offs between catching fraudulent transactions and minimizing false positives, ensuring that the model is both effective and practical in real-world deployment. By emphasizing these metrics, the system can optimize for business and operational objectives while mitigating the risk of undetected fraud.
Question 25
You are building a natural language processing (NLP) model to classify legal documents into multiple categories. Some documents are very long, and important information may appear anywhere in the text. Which approach is most appropriate?
A) Use a bag-of-words model on the first 500 words only
B) Use a transformer-based model with attention mechanisms
C) Train a simple recurrent neural network on truncated sequences
D) Use a TF-IDF vectorizer without considering word order
Answer: B
Explanation:
Using a bag-of-words model on the first 500 words only is limiting because it ignores the rest of the document. Legal documents often contain critical clauses or key information scattered throughout the text, and truncating to the first 500 words risks losing this information. Bag-of-words also ignores word order and context, which are essential for understanding legal language, potentially reducing classification performance significantly. While simple and computationally cheap, this approach is inadequate for long, structured documents with complex dependencies.
Using a transformer-based model with attention mechanisms is highly suitable for this task. Transformers, such as BERT, RoBERTa, or LegalBERT, can process entire sequences and use self-attention to focus on relevant parts of the text, regardless of their position. Attention mechanisms allow the model to weigh important words and phrases more heavily, capturing context and dependencies that may occur far apart in the text. This is particularly critical in legal documents, where the meaning of one clause may depend on another section many paragraphs away. Transformers can handle long-range dependencies and complex language structures, making them well-suited for multi-class classification of lengthy legal texts.
Training a simple recurrent neural network (RNN) on truncated sequences is limited by the difficulty RNNs have in retaining long-term dependencies. While RNNs process sequential information, they struggle with very long texts, even with LSTM or GRU variants, due to gradient vanishing or exploding problems. Truncating sequences further reduces the ability to capture critical information appearing later in the document. This approach may work for shorter texts, but it is inadequate for long, dense legal documents where important context can appear anywhere.
Using a TF-IDF vectorizer without considering word order provides a representation of term importance but ignores the sequence and context of words. While TF-IDF can work for simple text classification, legal documents often require an understanding of syntax, semantics, and relationships between terms. Ignoring word order reduces the model’s ability to differentiate between documents where meaning changes based on phrasing or negation, making TF-IDF alone insufficient for accurate multi-class classification.
Transformer-based models with attention mechanisms are the most appropriate approach because they handle long texts effectively, maintain contextual relationships across the entire document, and focus on the most relevant sections. This ensures accurate classification even when key information appears in different parts of lengthy legal documents, providing both interpretability and high performance.
Question 26
You are building a recommendation system for a music streaming service. Users can like songs, skip them, or listen repeatedly. The system must recommend new songs while maintaining user satisfaction. Which approach is most suitable?
A) Pure popularity-based recommendation
B) Reinforcement learning to optimize long-term user engagement
C) Randomly recommending songs
D) Collaborative filtering based only on historical plays
Answer: B
Explanation:
Pure popularity-based recommendation suggests songs solely based on overall popularity. While it can surface widely liked songs, it lacks personalization and fails to adapt to individual user preferences. Popularity alone cannot capture user-specific tastes or account for evolving interests. Recommending only popular songs may increase engagement initially, but risks user fatigue and dissatisfaction, particularly when the service aims to encourage discovery of new or niche content. Popularity-based approaches do not optimize for long-term engagement or repeated listening behavior.
Reinforcement learning (RL) to optimize long-term user engagement is highly suitable. In this scenario, user interactions—such as likes, skips, and repeat plays—serve as rewards, and the system can learn policies to maximize overall satisfaction over time. RL allows the system to balance exploration (introducing new songs) and exploitation (recommending known favorites), adapting recommendations based on user responses. By modeling sequential interactions, the system can anticipate future behavior and optimize for sustained engagement rather than short-term clicks, which is critical for a music streaming platform. RL frameworks can also incorporate constraints, such as diversity and novelty, ensuring a well-rounded user experience.
Randomly recommending songs introduces exploration but does not consider user preferences, making it inefficient. Random recommendations may occasionally discover new user favorites, but most recommendations will likely be irrelevant, reducing user satisfaction. This approach cannot optimize engagement or adapt to individual listening habits, making it impractical for production systems.
Collaborative filtering based only on historical plays relies on past user interactions to identify similar users or songs. While effective for recommending songs with sufficient interaction data, it struggles with new users (cold start) or new songs added to the catalog. Collaborative filtering may also reinforce existing patterns, reducing exposure to novel content, and it does not explicitly optimize for long-term engagement metrics. Users may receive predictable recommendations, limiting discovery and engagement over time.
Reinforcement learning is the most appropriate approach because it dynamically adapts to user behavior, balances exploration and exploitation, and optimizes long-term satisfaction. It can incorporate various reward signals, such as likes, skips, and repeat listens, allowing the system to learn strategies that maximize overall user engagement while recommending new songs. This approach is ideal for evolving user preferences and dynamic content environments.
Question 27
You are building a deep learning model for predicting crop yield based on satellite imagery. The dataset is large, but cloud cover sometimes obscures fields in images. Which strategy is most suitable?
A) Ignore images with clouds and train only on clear images
B) Use data augmentation and cloud-masking techniques
C) Replace cloudy regions with zeros
D) Train a simple linear model on raw pixel values
Answer: B
Explanation:
Ignoring images with clouds and training only on clear images reduces data availability and introduces selection bias. Many satellite images contain partial cloud cover, and removing them reduces the effective size of the training set, potentially excluding important seasonal or regional variations. Training only on clear images may also prevent the model from learning how to handle cloud-obscured data in production, limiting its robustness.
Using data augmentation and cloud-masking techniques is highly suitable. Cloud-masking algorithms identify and remove or adjust cloud-covered regions, allowing the model to focus on usable information. Data augmentation can simulate different environmental conditions, improving generalization and robustness. Together, these methods ensure the model learns meaningful patterns from both clear and partially obscured images, making predictions more reliable. Cloud-aware augmentation also reduces the impact of missing or noisy regions, enabling the system to leverage the full dataset effectively while maintaining accuracy.
Replacing cloudy regions with zeros introduces artificial patterns that may mislead the model. Zero values are not representative of actual field conditions and may be interpreted as meaningful signals, leading to incorrect feature learning. This approach can degrade model performance and make predictions unreliable, particularly when cloud coverage varies across the dataset.
Training a simple linear model on raw pixel values ignores spatial structure and complex relationships in imagery. Linear models cannot capture spatial dependencies or multispectral patterns critical for crop yield prediction. Moreover, cloud interference may further reduce model effectiveness, as linear models lack mechanisms to handle missing or obscured information. This approach is inadequate for high-resolution satellite imagery where advanced feature extraction is essential.
Using data augmentation combined with cloud-masking techniques is the most appropriate strategy because it addresses cloud interference while preserving valuable information. It allows the model to generalize across varying conditions, maintains robustness to occlusions, and leverages the full dataset effectively. This ensures reliable crop yield predictions and improves model performance in real-world satellite imagery applications.
Question 28
You are building a sentiment analysis system for social media posts. Posts contain slang, emojis, and misspellings. Which preprocessing technique is most effective for improving model performance?
A) Remove all non-alphanumeric characters
B) Apply text normalization, tokenization, and subword embeddings
C) Convert text to lowercase only
D) Ignore slang and emojis, focusing only on standard words
Answer: B
Explanation:
Removing all non-alphanumeric characters eliminates special characters, punctuation, and emojis. While this may reduce noise, it also discards important sentiment cues. Emojis often convey strong emotional content, and punctuation such as exclamation marks or question marks can modify the sentiment. Completely removing these elements risks losing significant information relevant to sentiment analysis, which may lead to reduced accuracy, particularly in informal social media contexts where these cues are prevalent.
Applying text normalization, tokenization, and subword embeddings is highly effective for this type of data. Text normalization standardizes spelling variations, corrects common misspellings, and handles informal language, ensuring that words with similar meanings are treated consistently. Tokenization breaks text into meaningful units, allowing models to process sequences efficiently. Subword embeddings, such as Byte-Pair Encoding (BPE) or WordPiece, handle rare words, slang, and neologisms by splitting them into smaller, learnable units. This allows the model to capture semantic meaning even for out-of-vocabulary terms and misspelled words. This combined approach ensures robust representations that maintain sentiment information across diverse, noisy inputs.
Converting text to lowercase only is a minimal preprocessing step. While it reduces vocabulary size by treating “Happy” and “happy” identically, it does not address misspellings, slang, or emojis. Lowercasing alone is insufficient for social media text, which often contains non-standard expressions and symbols crucial for sentiment understanding. Relying solely on lowercase conversion does not improve model generalization or handle the complexity of user-generated content.
Ignoring slang and emojis and focusing only on standard words discards rich sentiment signals. Slang and emojis are often central to conveying tone and emotion in informal writing. Removing these elements reduces the model’s ability to understand user intent, which is critical for accurate sentiment classification. This approach would limit the model’s applicability in real-world social media scenarios, where informal language predominates.
Text normalization, tokenization, and subword embeddings are the most effective preprocessing techniques because they retain semantic meaning, handle rare or informal words, and allow the model to generalize across noisy social media text. This approach ensures that sentiment cues conveyed through slang, misspellings, and emojis are captured effectively, improving model performance and robustness in practical applications.
Question 29
You are deploying a deep learning image classifier in a mobile application. Users report slow inference and high battery consumption. Which optimization technique is most suitable?
A) Train a larger model for higher accuracy
B) Apply model quantization and pruning
C) Increase input image resolution for better feature extraction
D) Use a fully connected network instead of convolutional layers
Answer: B
Explanation:
Training a larger model for higher accuracy increases computational demand, memory usage, and inference latency. While larger models may achieve better accuracy in theory, they are impractical for mobile deployment due to limited processing power and battery constraints. Users would experience slower performance, and the device may overheat or drain battery quickly. This approach prioritizes accuracy over efficiency, which is counterproductive in resource-constrained environments.
Applying model quantization and pruning is highly suitable for mobile deployment. Quantization reduces model precision, for example, converting weights from 32-bit floating point to 8-bit integers, decreasing memory footprint and computational requirements. Pruning removes redundant or low-importance connections and neurons, further reducing model size without significantly affecting accuracy. These techniques accelerate inference, reduce battery consumption, and enable deployment on devices with limited resources. Quantization-aware training can also maintain accuracy by compensating for reduced precision during training. Together, pruning and quantization provide an effective trade-off between performance and efficiency.
Increasing input image resolution for better feature extraction improves accuracy but increases computational cost, memory usage, and latency. High-resolution inputs require more operations for convolutional and pooling layers, exacerbating slow inference and battery drain. For mobile applications, higher resolution often provides diminishing returns relative to resource cost, making this approach unsuitable when efficiency is a priority.
Using a fully connected network instead of convolutional layers is inefficient for image data. Fully connected networks treat each pixel independently and require an enormous number of parameters for high-resolution images, leading to high memory usage and slow inference. Convolutional layers exploit spatial correlations, reduce parameters, and are computationally efficient, making them essential for practical image classification, particularly on mobile devices.
Model quantization and pruning are the most suitable optimization techniques because they reduce memory usage, computation, and power consumption while maintaining model accuracy. These methods enable real-time inference on mobile devices, improving user experience and allowing deep learning models to operate efficiently under constrained resources.
Question 30
You are building a predictive model for customer churn in a subscription service. The dataset is imbalanced, with far more non-churned than churned customers. Which approach is most effective?
A) Use standard accuracy as the evaluation metric
B) Apply class weighting or resampling techniques and evaluate with precision, recall, and F1 score
C) Remove non-churned customers to balance the dataset
D) Ignore the imbalance and train with the standard cross-entropy loss
Answer: B
Explanation:
Using standard accuracy as the evaluation metric is misleading in imbalanced datasets. If churned customers represent only 10% of the dataset, a model that predicts all customers as non-churned would achieve 90% accuracy but fail at identifying actual churn. Accuracy does not capture model performance on the minority class, which is critical for churn prediction. Relying on accuracy could lead to selecting models that appear performant but are ineffective in detecting customers at risk of leaving.
Applying class weighting or resampling techniques and evaluating with precision, recall, and F1 score is highly effective. Class weighting assigns higher importance to the minority class (churned customers) during training, penalizing misclassifications more heavily and encouraging the model to learn patterns associated with churn. Resampling techniques, such as oversampling churned customers or undersampling non-churned customers, balance the dataset, improving the model’s ability to detect rare events. Precision measures the correctness of predicted churns, recall measures the proportion of actual churns detected, and the F1 score balances both metrics. This combined approach ensures robust performance on the minority class while avoiding bias toward the majority class.
Removing non-churned customers to balance the dataset reduces data availability and may discard valuable information about customer behavior. While it increases the proportion of churned customers, it introduces bias and limits the model’s ability to learn realistic patterns across the full population. Important trends and differences between churned and non-churned customers may be lost, reducing predictive accuracy and generalization.
Ignoring imbalance and training with standard cross-entropy loss leads to models biased toward the majority class. The model is likely to predict most customers as non-churned, achieving high nominal accuracy but failing to detect churn effectively. This approach is unsuitable when the minority class carries significant business value, such as preventing customer attrition.
Class weighting or resampling combined with precision, recall, and F1 score evaluation is the most effective approach for handling imbalanced churn datasets. It ensures the model identifies high-risk customers accurately, supports actionable interventions, and maintains a balanced assessment of model performance. This strategy improves predictive power and allows the service to implement targeted retention measures efficiently.