Mth301 Final Year Project

Here is the tentative outline on topics provided by Mu He. You may also refer to publication and come with your own personalize topics. (more than welcome)

1. Advanced Statistical Methods

Big Data is a heated topic in a variety of fields such as biology and finance. The challenge of understanding these data has led to the development of new statistical tools and developed new areas such as machine learning and bioinformatics. Many of these tools are developed based on traditional statistical thinking, but are often expressed with different terminology.

In this project, students are expected to choose one or several topics and apply the methods to real datasets: Logistic regression, classification and regression trees, neural networks, boosting, bagging, unsupervised learning, signal processing, random forests and data visualization, etc. Students are expected to demonstrate their understanding of the tools, apply the tools using software and interpret software’s the results.

2. Survival Anaylsis

Survival data, where the primary outcome is time to a specific event, arise in many areas of biomedical research, including clinical trials and epidemiological studies. Many survival methods are extensions of techniques used in linear regression and categorical data, while other aspects of this field are unique to survival data.

In this project, students are expected to learn some survival analysis topics: Cox Model, AFT Model and Frailty Model, etc. Students are expected to demonstrate their understand of the model and interpret the results of application.

3. Epidemiology: Viral Dynamic Modelling

Coronavirus disease 2019 (COVID-19), an infectious disease caused by the infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is spreading and causing the global coronavirus pandemic. The viral dynamics of SARS-CoV-2 infection have not been quantitatively investigated intensely.

In this project, students are expected to understand the viral dynamic modelling through ODE.

4. Data Visualization, Data Simplification and Dimension Reduction Methods

It is difficult to plot and analysis dataset which is in high-dimension. The more variables a dataset has, the more likely it is to find that some variables are highly correlated with others. When a variable can be expressed by other variables, then retaining that variable will increase the complexity of datasets, but not increase the information much. This is the main reason for doing data simplification.

In this project, students are expected to understand some dimension reduction methods, such as Eigen-decomposition, SVD, Principal Component Analysis and Factor Analysis.

Students are expected to demonstrate their understand of the methods and interpret the results of application on real data.

5. A Review on Record-Based Transmuted Family of Distributions

Recently, much attention in distribution theory has focused on the family of transmuted distributions, such as Shaw and Buckley’s transmuted distributions is derived through a quadratic rank transmutation map. Its stochastic construction through the use of the order statistics facilitated a generalization of the original family of Shaw and Buckley’s transmuted distributions.

In this project, continuing on with a similar idea, students are expected to put forward a new family of transmuted distributions based on the theory of records, and then discuss its properties and some cases of special interest.

6. Estimated Shortfall and its Backtesting

Expected Shortfall (ES) is a measurement for estimating the market risk or credit risk of a portfolio, also known as conditional Value-at-Risk. There are several methods to backtest ES in the literature.

In this project, students are expected to understand the backtesting methods for ES, such as the conditional test, unconditional test and quantile test. Students are expected to use statistical software, such as Matlab, R etc.

7. Cognitive Diagnosis Models and Dina Model

Cognitive diagnosis models (CDMs) are special types of restricted latent class model that have received a great deal of attention in the past several years. CDMs try to identify if an examinee has mastered specific skills required to solve an item. CDMs try to identify the strength and weakness in a sets of fine-grained skills(or attributtes) differs from the objective of traditional measurement models.

In CDMs, a student is classified into dichotomous latent skill classes according to the corresponding response of a test and a given Q matrix which skills are required to master each item

Q-matrix

\[Q=\begin{bmatrix} q_{11} & \cdots & q_{1k} &\cdots &q_{1K}\\ \vdots & & \vdots & &\vdots\\ q_{j1}& \cdots & q_{jk} & \cdots & q_{jK}\\ \vdots & & \vdots & &\vdots\\ q_{J1} & \cdots & q_{Jk} & \cdots & q_{JK} \end{bmatrix}\]

where \(J\) means the number of items and \(K\) means the number of attributes.