Outline

Big Data is a heated topic in a variety of fields such as biology and finance. The challenge of understanding these data has led to the development of new statistical tools and developed new areas such as machine learning and bioinformatics. Many of these tools are developed based on traditional statistical thinking, but are often expressed with different terminology.

In this project, students are expected to choose one or several topics and apply the methods to real datasets: Logistic regression, classification and regression trees, neural networks, boosting, bagging, unsupervised learning, signal processing, random forests and data visualization, etc. Students are expected to demonstrate their understanding of the tools, apply the tools using software and interpret software’s the results.

Notes

  • Some Basic Plotting Note

  • Random forest Note

  • Others

Past Students

Interesting Reading

  • How the global health security index and environment factor influence the spread of COVID-19: A country level analysis Link