Vita/Revised: September,
2008
SOMNATH DATTA
|
U of L Office |
Home Office |
|
Department of Bioinformatics & Biostatistics |
1518 Crosstimbers Drive |
|
School of Public Health and Information Sciences |
Louisville, KY 40245 |
|
University of Louisville |
(502) 245 3504 (phone/fax) |
|
Louisville, KY 40202 |
|
|
(502) 852 6376 (phone) |
|
|
(502) 852 3294 (fax) |
|
PERSONAL:
Born 1962,
Calcutta (now Kolkata), India; Citizen, USA;
Married to Susmita;
one child Anisha.
EDUCATION:
·
Ph. D. (1988), Statistics and Probability, Michigan State University, East Lansing.
·
M. Stat. (1985), Mathematical Statistics and Probability, Indian Statistical Institute, Calcutta.
·
B. Stat. (1983), Statistics, Indian Statistical Institute, Calcutta.
ACADEMIC AND PROF
·
2005 (Summer) – present: Professor (tenured), Department of Bioinformatics and
Biostatistics, University of
Louisville, Louisville, KY, USA.
·
1998 (Fall) – 2005(Spring): Professor, Department of Statistics, University of Georgia, Athens, GA,
USA.
·
1993 (Fall) - 1998 (Summer): Associate Professor
(tenured), Department of Statistics, University of Georgia, Athens,
GA, USA.
·
1988 (Fall) - 1993 (Summer): Assistant Professor,
Department of Statistics, University of Georgia, Athens, GA, USA.
RESEARCH:
Ph. D. Dissertation Title: "Asymptotically
Optimal Bayes Compound and Empirical Bayes Estimators in Exponential
Families with Compact Parameter Space" (Professor James F. Hannan, Ph. D.
dissertation advisor).
Research Interest (past & present):
Biostatistics, Bioinformatics, Bootstrap Methods, Compound Decision, Analysis
of Clustered Data, Clustering and Classification, Empirical Bayes, Genomics,
Nonparametrics, Proteomics, Rank Tests, Survival Analysis, Time Series
Analysis.
Clinical: Autism, Spinal Cord Injury.
PUBLICATIONS:
85. Pihur,
V., Datta, S. and Datta, S. RankAggreg,
an R package for weighted rank aggregation. Submitted.
84. Wang,
M., Kong, M. and Datta, S. Inference for marginal linear models with clustered longitudinal
data with potentially informative cluster sizes. Submitted.
83. Lan, L.
and Datta, S. Comparison of state
occupation, entry, exit and waiting times in K independent multistate models
under current status data. Submitted.
82. Datta,
S., Lorenz, D. J., Morrison, S., Ardolino, E., Harkema, S. J. A multivariate
examination of temporal changes in Berg variables for patients with AIS C and D
spinal cord injuries. Archives of Physical
Medicine and Rehabilitation, to appear (2008).
81.
Datta, S., Bandyopadhyay, D. and Satten, G. A. Inverse probability of censoring
weighted U-statistics for right censored data with applications. Submitted.
80.
Datta, S., Lan, L. and Sundaram, R. Nonparametric estimation of waiting time
distributions in a Markov model based on current status data. Submitted.
79.
Lan, L. and Datta, S. Nonparametric
estimation of state occupation, entry and exit times with multistate current
status data. Statistical Methods in
Medical Research, doi:10.1177/0962280208094278 (2008).
78. Pihur, V., Datta, S. and Datta, S. Finding cancer genes through meta-analysis of microarray
experiments: Rank aggregation via the cross entropy algorithm. Genomics, doi:10.1016/j.ygeno.2008.05.003 (2008).
77.
Pihur, V., Datta, S. and Datta, S.
Reconstruction of genetic association networks from microarray data: A partial
least squares approach. Bioinformatics, 24, 561-568
(2008).
76.
Datta, S. Classification of breast cancer versus normal samples from mass
spectrometry profiles using linear discriminant analysis of important features
selected by Random Forest, Statistical Applications
in Genetics and Molecular Biology, 7 (2), Article 7 (2008).
75.
Brock, G., Pihur, V., Datta, S. and Datta,
S. clValid , an R package for cluster validation. Journal of Statistical Software, 25, 4 (2008).
74.
Pihur, V., Brock, G., Datta, S. and Datta,
S. Cluster validation for microarray data: An appraisal. In Multivariate Statistical Methods, ( A. SenGupta, ed), ISI Platinum Jubilee series, World Scientific Press, to
appear (2008).
73.
Datta, S., Datta, S., Parrish, R.
S. and Thompson, C. M.
Microarray data analysis, In Computational Methods in
Biomedical Research, (R. Khatree and D. Naik, eds.), Chapman &
Hall/CRC Biostatistics Series, Volume 24, 1-43 (2008).
72. Datta, S. and Satten, G. A. A signed-rank
test for clustered data. Biometrics, 64, 501-507
(2008).
71.
Bandyopadhyay, D. and Datta, S. A novel
approach to testing equality of survival distributions when the population
marks are missing. Journal of Statistical
Planning and Inference, 138, 1722-1732 (2008).
70.
Pihur, V., Datta, S. and Datta, S. Understanding
Chronic Fatigue Syndrome (CFS) from CAMDA data: A systems biology
approach. In CAMDA 2007
Proceedings, online @
http://camda.bioinfo.cipf.es/camda07/agenda/detailed.html (2007).
69.
Johnson, S. B., Datta, S., Hornung, C. A., Casanova, M. F. Mathematical models
of epigenetic influences in Autism: a new perspective based on neuropathological
findings. In Progress
in Autism Research, (Paul C. Carlisle, ed), Nova Science Publishers, Inc., 101-114, New York: New York (2007).
68.
Pihur, V., Datta, S. and Datta, S.
Weighted rank aggregation of cluster validation measures: A Monte Carlo cross-entropy
approach. Bioinformatics, 23, 1607-1615
(2007).
67.
Boratyn, G. M., Datta, S. and Datta, S.
Incorporation of biological knowledge into distance for clustering genes, Bioinformation, 1, 396-405 (2007).
66. Datta, S., Le-Rademacher, J. and Datta, S. Predicting patient survival
from microarray data by accelerated failure time modeling using partial least
squares and LASSO. Biometrics, 63, 259-271 (2007).
65.
Zheng, H., Basawa, I. V. and Datta, S.
First order random coefficient autoregressive processes, Journal of Statistical Planning and Inference, 173, 212 – 229
(2007).
64.
Datta, S., and Datta, S. Combining
functional information in selecting clustering algorithms. In Proceedings of Interface 2005, on CD-ROM (2006).
63.
Datta, S. and Datta, S.
Evaluation of clustering algorithms for gene expression data, BMC Bioinformatics, 7
(Suppl 4): S17, (2006).
62.
Datta, S. and Datta, S. Methods
for evaluating clustering algorithms for gene expression data using a reference
set of functional classes, BMC Bioinformatics, 7:397 (2006).
61.
Boratyn, G. M., Datta, S. and Datta, S. Biologically supervised hierarchical
clustering algorithms for gene expression data, In Proceedings
of the 28th IEEE EMBS Annual
International Conference, New York City, USA, 5515-5518 (2006).
60.
Zheng, H., Basawa, I. V. and Datta, S.
The p-th order random coefficient autoregressive processes, Journal of Time Series Analysis, 27, 411-440 (2006).
59.
Datta, S. and Sundaram, R. Nonparametric
marginal estimation in a multistage
model using current status data, Biometrics, 62, 829–837
(2006).
58.
Datta, S. and Datta, S. Validation measures for clustering algorithms
incorporating biological information, In IEEE
Proceedings of International Multi-Symposiums on Computer and Computional
Sciences (IMSCCS|06), (J. Ni, J. Dongarra, Y. Zheng, G. Gu, G. Wolfgang and H.
Jin, eds.), 1, 131-135 (2006). http://doi.ieeecomputersociety.org/10.1109/IMSCCS.2006.139
57.
Datta, S. Estimating the mean life time
using right censored data. Statistical Methodology, 2, 65-69 (2005).
56. Datta, S.
and Datta, S. Empirical Bayes screening (EBS) of many
p-values with applications to microarray studies, Bioinformatics, 21,1987-1994 (2005).
55.
Datta, S. and Satten, G. A. Rank-sum tests for clustered data, Journal of the American Statistical Association, 100, 908-915 (2005).
54.
Datta, S. Bootstrapping, In Encyclopedia of Statistical Sciences, Second edition, Wiley, (2005).
53.
Datta, S. Empirical Bayes methods, In Encyclopedia
of Statistical Sciences, Second edition, Wiley, (2005).
52.
Satten, G. A., Datta, S., Moura, H., Woolfitt, A., Carvalho, G., De, B. K, Pavlopoulos, A., Carlone, G. M., and Barr, J.
Standardization and denoising algorithms for mass spectra to classify
whole-organism bacterial specimens, Bioinformatics, 20, 3128-3136 (2004).
51. Datta, S. and Datta, S. An
empirical Bayes adjustment to multiple p-values for the detection of
differentially expressed genes in microarray experiments. In APBC 2004, (Y-P. P. Chen, ed.), 29, 155-159 (2004).
50.
Datta, S., Satten, G. A., Benos, D. J., Xia, J., Heslin, M., and Datta, S. An empirical Bayes adjustment
to increase the sensitivity of detecting differentially expressed genes in
microarray experiments, Bioinformatics, 20, 235-242
(2004).
49.
Satten, G. A. and Datta, S. Marginal Analyses of Multistage Data. In Handbook of Statistics (N. Balakrishnan and C. R. Rao,
eds.),
23, 559-574, Elsevier-North Holland
(2004).
48.
Chakraborty, S. and Datta, S. How will plant pathogens adapt to host
plant resistance at elevated CO2 under a changing climate? New Phytologist, 159, 733-742 (2003).
47.
Datta, S. and Datta, S. Comparisons
and validation of statistical clustering techniques for microarray gene
expression data. Bioinformatics, 19, 459-466
(2003).
46.
Williamson, J., Datta, S., and Satten, G. A. Marginal analyses of clustered
data when cluster size is informative. Biometrics, 59, 36-42 (2003).
45.
Datta, S. and Satten, G. A. Estimation of integrated transition hazards and
stage occupation probabilities for non-Markov systems under stage dependent censoring.
Biometrics, 58, 792-802 (2002).
44.
Satten, G. A. and Datta, S. Marginal estimation for Multistage models:
waiting time distributions and competing risk analyses. Statistics in Medicine, 21, 3-19 (2002).
43.
Datta, S. and Satten, G. A. Validity of the Aalen-Johansen estimators of stage
occupation probabilities and integrated transition hazards for non-Markov
models. Statistics and
Probability Letters, 55, 403-411 (2001).
42.
Satten, G. A., Datta, S. and Robins, J. M. An estimator for the survival
function when data are subject to dependent censoring. Statistics and Probability Letters, 54, 397-403 (2001).
41.
Satten, G. A. and Datta, S. The Kaplan-Meier Estimator as an
inverse-probability-of-censoring weighted average. American
Statistician, 55, 207-210 (2001).
40.
Williamson, J. M., Satten, G. A., Hanson, J. A., Weinstock, H., and Datta, S. Analysis
of dynamic cohort data. American Journal of
Epidemiology, 154, 366-372 (2001).
39.
Li, G. and Datta, S. A bootstrap approach to nonparametric regression for
right censored data. Annals of the Institute
of Statistical Mathematics, 53, 708-729 (2001).
38.
Datta, S. and Satten, G. A. Estimating future stage entry and occupation
probabilities in a multistage model based on randomly right-censored data. Statistics and Probability Letters, 50, 89-95 (2000).
37.
Satten, G. A. and Datta, S. A simulate-update algorithm for missing data
problems. Computational Statistics, 15, 243-277 (2000).
36.
Datta, S. Empirical Bayes estimation with non-identical components.
Journal of Nonparametric Statistics, 12, 709-725 (2000).
35.
Datta, S., Satten, G. A. and Datta,
S. Nonparametric estimation for the three-stage irreversible illness-death
model. Biometrics, 56, 841-847 (2000).
34.
Datta, S., Satten, G. A. and Williamson, J. M. Consistency
and asymptotic normality of estimators in a regression model with interval
censoring and left truncation. Annals of the Institute
of Statistical Mathematics, 52, 160-172 (2000).
33.
Datta, S., Satten, G. A. and Datta,
S. Estimation of stage occupation probabilities in multistage models. In Advances on Theoretical and Methodological Aspects of
Probability and Statistics, (N. Balakrishnan, ed.), 493-506 (2000),
Gordon and Breach.
32.
Satten, G. A., Janssen, R., Busch, M. P., and Datta, S. Validating marker-based
incidence estimates in repeatedly screened population. Biometrics, 55, 1224-1227 (1999).
31.
Allen, M. R. and Datta, S. Estimation of the index parameter for
autoregressive data using the estimated innovations. Statistics and Probability Letters, 41, 315-324 (1999).
30.
Satten, G. A. and Datta, S. Kaplan-Meier representation of competing risk
estimates. Statistics and Probability Letters, 42, 299-304 (1999).
29.
Allen, M. and Datta, S. A note on bootstrapping M-estimators in ARMA models. Journal of Time Series Analysis, 20, 365-380 (1999).
28.
Bagui S. C. and Datta, S. Some useful properties of the Bayes risk in
classification. Calcutta Statistical
Association Bulletin, 48, 83-91 (1998).
27.
Datta, S., Mathew G. and McCormick, W. P. Nonlinear autoregression with
positive innovations. Australian & New
Zealand Journal of Statistics, 40, 229-239 (1998).
26.
Satten, G. A. and Datta, S. and Williamson, J. M. A semiparametric approach to
the proportional hazards model for interval censored data. Journal of the American Statistical Association, 93, 318-327 (1998).
25.
Datta, S. and McCormick, W. P. Inference for the tail parameters of a linear
process with heavy tailed innovations. Annals of
the Institute of Statistical Mathematics, 50, 337-359 (1998).
24.
Datta, S. Making the bootstrap work. In Frontiers
in Probability and Statistics, (S. P. Mukherjee, S. K.Basu and B. K. Sinha,
eds),
Nasora Publishing, 119-129 (1998), Narosa, New Delhi.
23.
Datta, S. and Hannan, J. F. A uniform L1 law of large numbers for functions on
a totally bounded metric space. Sankhya A, 59, 167-174 (1997).
22.
Datta, S. L1 density estimation for linear processes. Journal of Time Series Analysis, 18, 375-383 (1997).
21.
Datta, S. and Sriram, T. N. A modified bootstrap for autoregression without
stationarity. Journal of Statistical
Planning and Inference, 59, 19-30 (1997).
20.
Datta, S. On asymptotic properties of bootstrap for AR(1) processes. Journal of Statistical Planning and Inference, 53, 361-374
(1996).
19.
Datta, S. and McCormick, W. P. Bootstrap inference for a first order
autoregression with positive innovations.
Journal of American Statistical Association, 90, 1289-1300
(1995).
18.
Datta, S. Limit theory and bootstrap for explosive and partially
explosive autoregression. Stochastic Processes and
Their Applications, 57, 285-304 (1995).
17.
Datta, S. and Sriram, T. N. A modified bootstrap for branching processes with
immigration. Stochastic Processes and
Their Applications, 56, 275-294 (1995).
16.
Datta, S. On a modified bootstrap for certain asymptotically non-normal
statistics. Statistics and
Probability Letters, 24, 91-98 (1995).
15.
Datta, S. A minimax optimal estimator for continuous monotone densities. Journal of Statistical Planning and Inference, 46, 181-193 (1995).
14.
Datta, S. Consistency of the mle for a general sequential design problem. Sankhya A, 57, 88-99 (1995).
13.
Datta, S. and McCormick, W. P. Some continuous Edgeworth expansions for Markov
chains with applications to bootstrap. Journal of
Multivariate Analysis, 52, 83-106 (1995).
12.
Datta, S. Empirical Bayes estimation in a threshold model. Sankhya A, 54, 106-117 (1994).
11.
Basawa, I. V. and Datta, S. Large sample estimation for nested models. Journal of the Indian Society of Probability and Statistics, 1, 19-42 (1994).
10.
Datta, S. A solution to the set compound problem with certain non regular
components. Statistics and
Decisions, 11, 343-355 (1993).
9.
Datta, S. and McCormick, W. P. Regeneration based bootstrap for Markov chains. Canadian Journal of Statistics, 21, 181-193 (1993).
8.
Datta, S. and McCormick, W. P. On first order Edgeworth expansions for a Markov
chain. Journal of Multivariate Analysis, 44, 345-359 (1993).
7.
Datta, S. Some non asymptotic bounds for L1 density estimation using
kernels. Annals of Statistics, 20, 1658-1667 (1992).
6.
Bhat, B. R. and Datta, S. On the completeness of a family of conditional
distributions. Statistics and
Probability Letters, 15, 27-30 (1992).
5.
Datta, S. A note on continuous Edgeworth expansions and the bootstrap. Sankhya A, 54, 171-182 (1992).
4.
Datta, S. and McCormick, W. P. Bootstrap for a finite state Markov chain
based on i.i.d. resampling. In Exploring the Limits of
Bootstrap, (L. LePage and L. Billard, eds), 77-97 (1992), Wiley, New York.
3.
Datta, S. Nonparametric empirical Bayes estimation with O(n-1/2) rate of
a truncation parameter. Statistics and
Decisions, 9, 45-61 (1991).
2.
Datta, S. Asymptotic optimality of Bayes compound estimators in compact
exponential families. Annals of Statistics, 19, 354-365 (1991).
1.
Datta, S. On the consistency of posterior mixtures and its application. Annals of Statistics, 19, 338-353 (1991).
EXTERNAL RESEARCH FUNDING (last 10 years):
AWARDS/HONORS:
PROFESSIONAL ACTIVITIES:
TEACHING:
At University of Georgia (1988—2005):
At University
of Louisville (2005 - current):
GRADUATE STUDENTS:
PhD
·
Michael R. Allen, Inference and Bootstrap for Some Linear
Time Series Models. Completed:
Summer 1997. Currently at Department of Mathematics, Tennessee Technological
University, Cookeville, TN 38505.
·
S. Kim (jointly with I. V. Basawa), Inference for
Nonlinear Time Series Models via Estimating Functions. Completed: Spring 1998.
·
HaiTao Zheng (jointly with I. V. Basawa), Inference for Time Series Models
for Count Data. Completed: Summer 2005.
·
Dipankar Bandopadhyay,
Novel Nonparametric Methods for Event Time Data. Completed:
Spring 2006. Currently at Department of Biostatistics, Bioinformatics and
Epidemiology, Medical University of South Carolina, Charleston, SC 29425.
·
DeSale Habtzghi (jointly with M. Meyer), Maximum Likelihood Based
Estimation of Hazard Function under Shape Restrictions and Related Statistical
Inference. Completed: Spring 2006. Currently at Department of Statistics,
University of Akron, Akron, OH 44325.
·
Lan Ling, Inference for Multistate Models. Completed: Summer 2008. Currently at Department
of Biostatistics and Epidemiology, Medical College of Georgia, Augusta, GA
30912.
·
Vasyl Pihur (jointly with Susmita Datta), expected completion:
Summer 2010.
·
Doug Lorenz (jointly with R. Gill), expected completion: Summer
2010.
·
Jie Fan, expected completion:
Summer 2010.
MS
· &