Two Issues in Database Search Methods for Mass Spectrometrybased Peptide Identification
10am
Room 2463 (Lifts 25 & 26), 2/F Academic Building, HKUST

Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:

Thesis Examination Committee

Prof Ning LI, LIFS/HKUST (Chairperson)
Prof Weichuan YU, ECE/HKUST (Thesis Supervisor)
Prof Henry Hei Ning LAM, CBE/HKUST

 

Abstract

Mass spectrometry (MS) is currently the mainstream technique in analyzing protein samples. As a fundamental task in tandem mass spectrometry experiments, peptide identification plays an essential role in providing sequence information for protein analysis. Although traditional algorithms of peptide identification are mature, when post-translational modifications (PTMs) and cross-linking techniques are taken into consideration, they fail to provide satisfactory results due to the heavy computation. Existing methods specialized in PTM identification and cross-linked peptide identification suffer from the following issues:

  1. In PTM identification, the specified number of PTMs during the search is limited.
  2. In cross-linked peptide identification, only very few tools could finish the search within an acceptable period of time when a large database is used.

In this thesis, a simulation-based study on tag-based database search for peptide identification with PTMs is proposed to tackle the first issue. In addition, a novel linear time algorithm for cross-linked peptide identification is presented. The simulation-based study uses the tag-based database search method to identify core peptides, i.e., the peptide backbones without PTM annotations. It demonstrates the performance trend of the tag-based database search when the tandem mass spectra (MS2) have different qualities, and provides a model for predicting the performance. The feasible region of tag-based database search is then obtained for a reliable use of this method. 

The novel linear time algorithm for cross-linked peptide identification solves several remaining issues in existing methods. The new algorithm is implemented in a tool called Xolik. It provides precise numerical computations while achieving a linear time complexity. Experiments using synthetic and empirical datasets show that it outperforms existing tools in terms of running time and statistical power. Theoretical proofs of the correctness and time complexity of this linear time algorithm are also provided in the thesis.

Speakers / Performers:
Jiaan DAI
Language
English
Post an event
Campus organizations are invited to add their events to the calendar.