Task
Machine Learning
A-Plus Writing Help For University Students
Get expert assistance in any academic field. All courses and programs covered.
Get Help Now!This document sets out the two (2) questions you are to complete for CAB420 Assignment 1C. The assignment is worth 12% of the overall subject grade. All questions are weighted equally. Students are to work individually. Students should submit their answers in a single document (either a PDF or word document), and upload this to TurnItIn.
Further Instructions:
1. Data required for this assessment is available on blackboard alongside this document in CAB420 Assessment 1C Data.zip. Please refer to individual questions regarding which data to use for which question.
2. Answers should be submitted via the TurnItIn submission system, linked to on Blackboard. In the event that TurnItIn is down, or you are unable to submit via TurnItIn, please email your responses.
3. For each question, a concise written response (approximately 2-3 pages) is expected. This response should explain and justify the approach taken to address the question (including, if relevant, why the approach was selected over other possible methods), and include results, relevant figures, and analysis. Python Notebooks, or similar materials will not on their own constitute a valid response to a question and will score a mark of 0.
4. Python code, including live scripts or notebooks (or equivalent materials for other languages) may optionally be included as appendices. Figures and outputs/results that are critical to question answers should be included in the main question response, and not appear only in an appendix.
5. Students who require an extension should lodge their extension application with HiQ Please note that teaching staff (including the unit coordinator) cannot grant extensions.
Problem 1. Clustering and Recommendations. Recommendation engines are typically built around clustering, i.e. finding a group of people similar to a person of interest and making recommendations for the target person based on the response of other subjects within the identified cluster.
You have been provided with a copy of the MovieLens small dataset1 , which contains movie review data for 600 subjects. The data is contained in the Q1 directory within the data archive, and is split over several files as follows:
- ratings.csv: Contains the movie ratings, and consists of a user ID, a movie ID, a rating (out of 5), and a timestamp.
- movies.csv: A list of all movie ID’s, alongside the movie titles and a list of genres.
- tags.csv: A list of tags applied to movies by users. Each entry consits of a user ID, a movie ID, the text tag, and a timestamp.
- links.csv: Contains IDs to link the MovieLens dataset to IMDB and TMBD.
It is recommended that you do not use the tags.csv and links.csv file, though they are contained here for completeness and you may choose to use them if you wish.
You have been provided with data loading functions for ratings.csv and movies.csv that will:
- Compute the average rating of each movie;
- Reformat the list of genres in movies.csv to a set of columns, one per genre, where a value of 1 indicates that the movie belongs to that genre and a values of NaN indicates that it does not;
- Merge the ratings.csv and movies.csv tables, obtaining a table that provides detailed information on each movie a user has reviewed.
- Create a combined table that computes the average rating each user has reported for movies belonging to each genre.
Note that each movie can belong to multiple genres.
Your Task: Using the provided data, and (optionally) the above described code you are to develop a method to cluster users based on their movie viewing preferences. Having developed this, provide recommendations for the users with the IDs 4, 42, and 314.
A suggested approach to solving this problem is to:
- Cluster the combined table that contains the average rating each user has reported for movies belonging to each genre. You will have to decide how you treat genres that have an average rating of NaN, which indicates that the user has not watched any movies from this genre; and select an appropriate clustering method and clustering hyper-parameters.
- Identify the clusters that contain the target users, 4, 42, and 314.
- Find the most popular movies within clusters that contain the target users, that the target users have not already seen.
- Note that the above is simply a suggested approach, and you are welcome to select an alternate method.
Your final response should include sections that address the following:
- A description of and justification for your clustering method. This should include:
- Description and justification of the data that you chose to cluster;
- Justification for the selected clustering method;
- Justification for the selected clustering hyper-parameters.
- A brief discussion and analysis of the results of the clustering, including interpretation of the resultant clusters (i.e. are clusters distinct, do they capture different viewer habits?)
- Recommendations for the three users with IDs: 4, 42, and 314; and a short discussion of these recommendations which includes:
- A brief description and justification for how recommendations were obtained;
- If the recommendations make sense given these users viewing history and previous ratings.
Problem 2. Multi-Task Learning. Semantic person search is the task of matching a person to a semantic query. For example, given the query ‘1.8m tall man wearing jeans a red shirt’, a semantic person search method should return images that feature people matching that description. As such, a semantic search process needs to consider multiple traits. A simple approach to enable this form of search is use classification to determine the traits present in an input image.
You have been provided with a dataset (see Q2/Q2.tar.gz) that contains the following semantic annotations:
- Gender: -1 (unknown), 0 (male), 1 (female)
- Pose: -1 (unknown), 0 (front), 1 (back), 2 (45 degrees), 3 (90 degrees)
- Torso Clothing Type: -1 (unknown), 0 (long), 1 (short)
- Torso Clothing Colour: -1 (unknown), 0 (black), 1 (blue), 2 (brown), 3 (green), 4 (grey), 5 (orange), 6 (pink), 7 (purple), 8 (red), 9 (white), 10 (yellow)
- Torso Clothing Texture: -1 (unknown) , 0 (irregular), 1 (plaid), 2 (diagonal plaid), 3
- (plain), 4 (spots), 5 (diagonal stripes), 6 (horizontal stripes), 7 (vertical stripes)
- Leg Clothing Type: -1 (unknown), 0 (long), 1 (short)
- Leg Clothing Colour: -1 (unknown), 0 (black), 1 (brown), 2 (blue), 3 (green), 4 (grey), 5 (orange), 6 (pink), 7 (purple), 8 (red), 9 (white), 10 (yellow)
- Leg Clothing Texture: -1 (unknown) , 0 (irregular), 1 (plaid), 2 (diagonal plaid), 3 (plain), 4 (spots), 5 (diagonal stripes), 6 (horizontal stripes), 7 (vertical stripes)
- Luggage: -1 (unknown), 0 (yes), 1 (no)
The unknown class can be considered either a class in it’s own right (i.e. three classes of gender), or can be considered as missing data. Note that three colours are annotated for each of the torso and leg clothing colour, indicating the primary, secondary and tertiary colours. One or both of the secondary and tertiary colours may be set to unknown (-1) to indicate that there are only 1 or 2 colours in the garment. In addition, the dataset contains semantic segmentation for each image in the training data,
That breaks the image down into the following regions:
- Leg clothing
- Shoes
- Torso clothing
- Luggage
- Leg skin regions
- Torso/arm skin regions
- Facial skin regions
- Hair
Your Task: Using this data you are to implement a multi-task deep learning approach that, given an input image, classifies the traits:
- Gender
- Torso Clothing Type
- Primary Torso Clothing Colour
- Leg Clothing Type
- Primary Leg Clothing Colour, and
- Presence of Luggage.
Pose and the semantic segmentation data may optionally be used when developing your approach (though remember that semantic segmentation data is only available for the training set, so cannot be used as a model input). Additional traits (clothing texture, secondary and tertiary torso and leg colours) should be ignored.
You have been provided code to:
- Load the images, labels and semantic masks ready for use with keras and tensorflow;
- Demonstrate how to use a generator to augment an input image and produce multiple outputs.
Your final response should include sections that address the following:
- Any pre-processing that is performed on the data (cropping, resizing), any data augmentation that is used, and how the missing data (i.e. instances of -1) are handled. Note that you may wish to crop and/or resize data to reduce the computational demands of your approach. This is completely acceptable, though the pre-processing should be explained, and care should be taken to ensure that the images are not resized to such an extent that traits become indistinguishable.
- A description and justification for your approach. This should include justification for the network design and training. If you choose use a pre-trained neural network (or part of a network) and fine-tune it, details and justification for this must be provided.
- An evaluation of performance for each of the traits using the provided test set. The evaluation should include an investigation of situations where the proposed solution performs poorly, and a discussion on the implications of the performance of the classifiers on the overall task: semantic search. Note that while this discussion should consider the broader context of the semantic search task, you are not required or expected to implement the semantic search task.
This CAB420–IT Computer Science Assignment has been solved by our IT Computer Science Expert at TV Assignment Help. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing Style. Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered.
You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turn tin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
Welcome to our Online Essay Writing Agency. Securing higher grades costing your pocket? Order your assignment online at the lowest price now! Our online essay writers are able to provide high-quality assignment help within your deadline. With our homework writing company, you can order essays, term papers, research papers, capstone projects, movie review, presentation, annotated bibliography, reaction paper, research proposal, discussion, or another assignment without having to worry about its originality – we offer 100% original content written completely from scratch
We write papers within your selected deadline. Just share the instructions