mixOmics Student Intern Project

Introduction

mixOmics is a large R package that provides statistical methods to integrate omics data sets (e.g transcriptomics, proteomics, metabolomics, metagenomics) that simultaneously measure the activity of thousands of biological features (e.g transcripts, proteins, metabolites, bacteria). Data integration enables identification of specific biological relationships between these features (e.g. genes and proteins), to create new insights into molecular processes involved in health and disease. MixOmics includes 19 data integration methods, amongst which 13 were developed in our lab. These methods are all based on dimension reduction using Projection to Latent Structures (PLS).

Our users include computational biologists, molecular biologists and bioinformaticians who wish to integrate their data and identify signatures of genes, proteins etc. to explain or predict a disease outcome. The package (ranked in the top 5% package in Bioconductor) is easy to use because all methods use the same underlying PLS principles and produce numerous graphics for interpretation (Fig. 1). We continuously improve the mixOmics package based on the community feedback.

Duties while on placement

As this is a large project, the internship requires complementary skillsets to:

improve specific aspects of the package (e.g increase coverage for unit tests, trouble-shoot bugs or provide new features requested by users, improve code quality, develop new graphics)
improve our existing tutorials and develop new ones on www.mixOmics.org
if there is an appropriate opportunity and motivation, respond to users questions on our discussion forum at https://mixomics-users.discourse.group. This is because it would require a good mastery of the methods and would only apply towards the end of the internship.

After the (steep) learning phase, there will be opportunities for students to propose new features and functionalities in the package if they wish.

Figure 1. Overview of the methods in mixOmics for data exploration and integration of multiple omics data sets (courtesy of Prof. Lê Cao)

Skills and Pre-requisites

Skills and Pre-requisites:

Very good knowledge of R and Linux command-line
Ability to learn and understand high-level statistical concepts quickly
Ability to work independently and to report to a group and discuss theories and results
Excellent skills in statistical analysis of complex data
Ability to work with github
Ability to interact with users
Interest in biological applications

Benefits for students

Benefits for students whilst undertaking the internship include:

Each student will get hands on experience in working in an emerging research software environment.
Gain understanding of how real-world software is assessed, developed and how priorities and requirements are established within a research environment.
Gain understanding of the importance of maintainable, scalable and extensible code.
Improving oral and written communication skills in a team environment.
Learn about new statistical methods for mining large data
Learn about high-throughput biology

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

student-mixOmics.md

student-mixOmics.md

mixOmics Student Intern Project

Introduction

Duties while on placement

Skills and Pre-requisites

Benefits for students

Files

student-mixOmics.md

Latest commit

History

student-mixOmics.md

File metadata and controls

mixOmics Student Intern Project

Introduction

Duties while on placement

Skills and Pre-requisites

Benefits for students