Computational modeling of protein-DNA and -RNA binding

Our research focuses on computational modeling of protein-DNA and -RNA interactions. Protein-DNA and -RNA interactions are key regulators of gene expression, and as such are involved in almost any process in the cell, including many diseases. Technologies measuring these interactions produce thousands and millions of data points in a single experiment, which can only be analyzed computationally.

In our research, we develop efficient computational methods to analyze the data and produce accurate models to predict new interactions and better understand the process at test. To handle the vast amounts of data, I develop efficient methods to process the data and extract relevant information on which I apply learning methods to infer accurate predictive models.

Deep learning for computational biology

We are very excited to utilize the most advanced machine learning methods to generate more accurate protein-DNA, -RNA and -peptide binding models. The recent advancement in neural networks, termed deep learning, has attracted much attention in the computational biology field.

We are applying it successfully to many high-throughput datasets, and plan to take it even further by incorporating several orthogonal sources to improve in vivo binding prediction.

Combinatorial algorithms for sequence design problems

In addition, we develop algorithms to generate compact universal sequence libraries under different biological constraints to improve experimental throughput and enable novel discoveries.

We use our experience and knowledge in graph theory to solve combinatorial problems in sequence design, which include, for example, flow and matching algorithms, ILP formulations and greedy heuristics.