Python code/output snippets

Submit your HW as a word or PDF file, with Python code/output snippets pasted as images.

Part 1. 6 points

Read Chapter 2 of Geron’s book.
Open https://github.com/ageron/handson-ml2/blob/master/02_end_to_end_machine_learning_project.ipynb . Carefully review each line of code in the notebook and try to understand what it tries to achieve. Chapter 2 describes the entire ML project that is implemented in the notebook.
Copy, paste and run the entire ML notebook in your Python environment. After completing this, add a line, copy/paste the line and the output (2 points):
print (“I, <your name> have successfully run Chapter 1 notebook in my environment”)

Answer the following questions (2 sentences per answer, unnecessarily long answers will receive lower grade) (2 points per question):
What does line 5 do? Why is it important?
What does line 20 do? Why is it important?
What does line 38 do? Why is it important?
What does line 49 do? When would you perform this operation?
What does line 66 do? When would you perform this operation?

Part 2. – 14 points – 2 points per question

watson_healthcare_modified.csv

Go to Kaggle.com, download Employee Attrition for Healthcare dataset from https://www.kaggle.com/datasets/925f54cca84887ec452f1ae1cd430ba9b37cfa555c2e1575760c3f2265c3a696?resource=download
Open dataset in Excel, review the file. Provide answers to the following questions:
Formulate a business problem that could be addressed by building a supervised ML model using this dataset. What would be the target variable in your ML formulation? What type of ML problem would this be?
Based on your business problem description, articulate business requirements for the ML-based solution: Frequency of predictions, number of predictions per period, on-line vs. batch predictions, frequency of model updates, other?
Open the dataset in Pandas, using descriptive statistics and data visualization tools, answer the following questions:
How would you describe the dataset and its fitness for addressing the problem at hand in terms of data quality and signal, and why? Include Python code and output in your outputs.
Based on the descriptive analysis, which variables do you expect to be most useful for addressing your problem and why? Include Python code and output in your outputs.
What transformation do you expect to perform on your data before you use it to build an ML using an algorithm of your choice and why? You need to indicate all necessary transformations for all the variables that you include in your answer to question b.
Demonstrate the following transformations using variables of your choice from your answer to question b:
One-hot encoding
Scaling (normalization or standardization)
Imputation using a mean or mode
Log transformation
Identify two new features that could be useful for your classification problem that can be constructed using the data. Create such features.

Leave a Comment

Your email address will not be published. Required fields are marked *

+1 587-331-9072
WHAT'SAPP US, WE'LL RESPOND
AustralianEssayHelp
We will write your work from scratch and ensure that it is plagiarism FREE, you just submit the completed work.