# Bayesian Nonparametrics for Biophysics

The main goal of data analysis is to summarize huge amount of data (as our observation) with a few numbers that come up us with some sort of intuition into the process that generated the data. Regardless of the method we use to analyze the data, the process of analysis includes (1) create the mathematical formulation for the problem, (2) data collection, (3) create a probability model for the data, (4) estimate the parameters of the model, and (5) summarize the results in a proper way-a process that is called ”statistical inference”.

Recently it has been suggested that using the concept of Bayesian approach and more specifically Bayesian nonparametrics (BNPs) is showed to have a deep influence in the area of data analysis [1], and in this field, they have just begun to be extracted [2–4]. However, to our best knowledge, there is no single resource yet avail-able that explain it, both its concepts, and implementation, as would be needed to bring the capacity of BNPs to relieve on data analysis and accelerate its unavoidable extensive acceptance.

Therefore, in this dissertation, we provide a description of the concepts and implementation of an important, and computational tool that extracts BNPs in this area specifically its application in the field of biophysics. Here, the goal is using BNPs to understand the rules of life (in vivo) at the scale at which life occurs (single molecule)from the fastest possible acquirable data (single photons).

In chapter 1, we introduce a brief introduction to Data Analysis in biophysics.Here, our overview is aimed for anyone, from student to established researcher, who plans to understand what can be accomplished with statistical methods to modeling and where the field of data analysis in biophysics is headed. For someone just getting started, we present a special on the logic, strengths and shortcomings of data analysis frameworks with a focus on very recent approaches.

In chapter 2, we provide an overview on data analysis in single molecule bio-physics. We discuss about data analysis tools and model selection problem and mainly Bayesian approach. We also discuss about BNPs and their distinctive characteristics that make them ideal mathematical tools in modeling of complex biomolecules as they offer meaningful and clear physical interpretation and let full posterior probabilities over molecular-level models to be deduced with minimum subjective choices.

In chapter 3, we work on spectroscopic approaches and fluorescence time traces.These traces are employed to report on dynamical features of biomolecules. The fundamental unit of information came from these time traces is the single photon.Individual photons have information from the biomolecule, from which they are emit-ted, to the detector on timescales as fast as microseconds. Therefore, from confocal microscope viewpoint it is theoretically feasible to monitor biomolecular dynamics at such timescales. In practice, however, signals are stochastic and in order to derive dynamical information through traditional means such as fluorescence correlation spectroscopy (FCS) and related methods fluorescence time trace signals are gathered and temporally auto-correlated over many minutes. So far, it has been unfeasible to analyze dynamical attributes of biomolecules on timescales near data acquisition as this requests that we estimate the biomolecule numbers emitting photons and their locations within the confocal volume. The mathematical structure of this problem causes that we leave the normal (”parametric”) Bayesian paradigm. Here, we utilize novel mathematical tools, BNPs, that allow us to extract in a principled fashion the same information normally concluded from FCS but from the direct analysis of significantly smaller datasets starting from individual single photon arrivals. Here, we specifically are looking for diffusion coefficient of the molecules. Diffusion coefficient allows molecules to find each other in a cell and at the cellular level, determination of the diffusion coefficient can provide us valuable insights about how molecules interact with their environment. We discuss the concepts of this method in assisting significantly reduce phototoxic damage on the sample and the ability to monitor the dynamics of biomolecules, even down to the single molecule level, at such timescales.

In chapter 4, we present a new approach to infer lifetime. In general, fluorescenceLifetime Imaging (FLIM) is an approach which provides us information on the numberof species and their associated lifetimes. Current lifetime data analysis methods relyon either time correlated single photon counting (TCSPC) or phasor analysis. These methods require large numbers of photons to converge to the appropriate lifetimes and do not determine how many species are responsible for those lifetimes. Here, we propose a new method to analyze lifetime data based on BNPs that precisely takes into account several experimental complexities. Using BNPs, we can not only identify the most probable number of species but also their lifetimes with at least an order magnitudes less data than competing methods (TCSPC or phasors). To evaluate our method, we test it with both simulated and experimental data for one, two, three and four species with both stationary and moving molecules. Also, we compare our species estimate and lifetime determination with both TCSPC and phasor analysis for different numbers of photons used in the analysis.

In conclusion, the basis of every spectroscopic method is the detection of photons.Photon arrivals encode complex dynamical and chemical information and methods to analyze such arrivals have the capability to reveal dynamical and chemical processes on fast timescales. Here, we turn our attention to fluorescence lifetime imaging and single spot fluorescence confocal microscopy where individual photon arrivals report on dynamics and chemistry down to the single molecule level. The reason this could not previously be achieved is because of the uncertainty in the number of chemical species and numbers of molecules contributing for the signal (i.e., responsible for contributing photons). That is, to learn dynamical or kinetic parameters (like diffusion coefficients or lifetime) we need to be able to interpret which photon is reporting on what process. For this reason, we abandon the parametric Bayesian paradigm and use the nonparametric paradigm that allows us to flexibly explore and learn numbers of molecules and chemical reaction space. We demonstrate the power of BNPs over traditional methods in single spot confocal and FLIM analysis in fluorescence lifetime imaging.