17.16. Assignment 3#

Instructions: This problem set should be done individually.

Answer each question in the designated space below.

After you are done, save and upload in blackboard.

Please check that you are submitting the correct file. One way to avoid mistakes is to save it with a different name.

17.17. Write your name and simon email#

Please write names below

  • [Name]:

  • [email]:

17.18. Discussion Forum Assignment#

Click on the “Market Efficiency” link in the discussion forum assignment

https://edstem.org/us/courses/30665/discussion/1972568

17.19. Exercises#

Exercise 1.

Start by importing pandas and loading the data set.

The dataset has address https://raw.githubusercontent.com/amoreira2/Lectures/main/assets/data/CAhousing.csv

Name housing this dataset, then print print(housing).

import pandas as pd
import numpy as np
# import data here

Exercise 2.

Print the first 5 rows of the dataframe.

Hint: use head

# your code here

Exercise 3.

Write a Pandas program to find the number of rows and columns and data type of each column of housing DataFrame.

print("Number of rows and columns:")
# your code here
print("\nData type of each column:")
# your code here

Exercise 4.

Write a Pandas program to combine longitude and latitude into one new column. Name the new column ‘Coordinates’. Each value has the format of [longitude,latitude]. For example, the first row has the value of [-122.23,37.88].

# your code here

print(housing.head())

Exercise 5.

Write a Pandas program to set the column of population as the index of the DataFrame.

# your code here

print(housing.head())

Exercise 6.

Reimport the data to drop the changes in the preceding exercises.

# import data here

print(housing)

Exercise 7.

create a new column, call it “incomeLevel”, assign values using the following rule:

  • “Low” if medianIncome is at the bottom the 25th percentile

  • “Below Median” if medianIncome is above the 25th percentile but below the 50th percentile

  • “Above Median” if medianIncome is above the 50th percentile but below the 75th percentile

  • “High” if medianIncome is at the top 25th percentile

Hint: use quantile function

# your code here

print(housing)

Exercise 8.(hard)

Write a Pandas program and use groupby to calculate count, minimum, maximum for medianHouseValue for each category of incomeLevel

# your code here

Exercise 9.

Write a Pandas program to show the data where the incomeLevel is either ‘High’ or ‘Above Median’.

Hint: try using isin

# your code here

Exercise 10.

Write a Pandas program to read rows 0 through 2 (inclusive), columns ‘totalRooms’ and ‘totalBedrooms’ of the DataFrame.

# your code here

Exercise 11.

Write a Pandas program to create a histogram of the ‘households’ Series (distribution of a numerical variable) of the DataFrame.

# your code here

Exercise 12.

Write a Pandas program to get 5 randomly sampled rows from the DataFrame.

Hint: you will need sample.

# your code here

Exercise 13.

Write a Pandas program to show the data where populaiton>2500 and medianIncome>12.

# your code here

Exercise 14.

Write a Pandas program to display all column name of the DataFrame.

# your code here

Exercise 15.

Write a Pandas program to sort the data by the value of medianHouseValue in a descending order.

# your code here