Google Professional Machine Learning Engineer Certification Exam is a credential that is designed for professionals who are interested in building and deploying machine learning models using the Google Cloud Platform. This certification exam is an ideal choice for individuals who have experience in developing machine learning solutions, are familiar with cloud computing technologies, and are interested in pursuing a career in machine learning engineering.

Google Professional Machine Learning Engineer Sample Questions (Q106-Q111):

A Machine Learning Specialist is given a structured dataset on the shopping habits of a company's customer base. The dataset contains thousands of columns of data and hundreds of numerical columns for each customer. The Specialist wants to identify whether there are natural groupings for these columns across all customers and visualize the results as quickly as possible.
What approach should the Specialist take to accomplish these tasks?

  • A. Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot.
  • B. Run k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster.
  • C. Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.
  • D. Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph.

Answer: C

A company uses a long short-term memory (LSTM) model to evaluate the risk factors of a particular energy sector. The model reviews multi-page text documents to analyze each sentence of the text and categorize it as either a potential risk or no risk. The model is not performing well, even though the Data Scientist has experimented with many different network structures and tuned the corresponding hyperparameters.
Which approach will provide the MAXIMUM performance boost?

  • A. Use gated recurrent units (GRUs) instead of LSTM and run the training process until the validation loss stops decreasing.
  • B. Initialize the words by word2vec embeddings pretrained on a large collection of news articles related to the energy sector.
  • C. Initialize the words by term frequency-inverse document frequency (TF-IDF) vectors pretrained on a large collection of news articles related to the energy sector.
  • D. Reduce the learning rate and run the training process until the training loss stops decreasing.

Answer: D

Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?

  • A. Use Al Platform Training to execute the experiments Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
  • B. Use Kubeflow Pipelines to execute the experiments Export the metrics file, and query the results using the Kubeflow Pipelines API.
  • C. Use Al Platform Training to execute the experiments Write the accuracy metrics to BigQuery, and query the results using the BigQueryAPI.
  • D. Use Al Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API

Answer: B

Explanation: Kubeflow Pipelines (KFP) helps solve these issues by providing a way to deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility. Cloud AI Pipelines makes it easy to set up a KFP installation.
"Kubeflow Pipelines supports the export of scalar metrics. You can write a list of metrics to a local file to describe the performance of the model. The pipeline agent uploads the local file as your run-time metrics. You can view the uploaded metrics as a visualization in the Runs page for a particular experiment in the Kubeflow Pipelines UI."

A web-based company wants to improve its conversion rate on its landing page. Using a large historical dataset of customer visits, the company has repeatedly trained a multi-class deep learning network algorithm on Amazon SageMaker. However, there is an overfitting problem: training data shows 90% accuracy in predictions, while test data shows 70% accuracy only.
The company needs to boost the generalization of its model before deploying it into production to maximize conversions of visits to purchases.
Which action is recommended to provide the HIGHEST accuracy model for the company's test and validation data?

  • A. Allocate a higher proportion of the overall data to the training dataset
  • B. Apply L1 or L2 regularization and dropouts to the training
  • C. Reduce the number of layers and units (or neurons) from the deep learning network
  • D. Increase the randomization of training data in the mini-batches used in training

Answer: C

Your company manages a video sharing website where users can watch and upload videos. You need to create an ML model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your company's website.
Which result should you use to determine whether the model is successful?

  • A. The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.
  • B. The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.
  • C. The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.
  • D. The model predicts videos as popular if the user who uploads them has over 10,000 likes.

Answer: B


