Vita/Revised: September, 2008

SOMNATH DATTA

 

U of L Office

Home Office

Department of Bioinformatics & Biostatistics

1518 Crosstimbers Drive

School of Public Health and Information Sciences

Louisville, KY 40245                           

University of Louisville

(502) 245 3504 (phone/fax)

Louisville, KY 40202

                          

(502) 852 6376 (phone)

 

(502) 852 3294 (fax)

 

 

PERSONAL:

Born  1962, Calcutta (now Kolkata), India; Citizen, USA;
Married to Susmita; one child Anisha.

EDUCATION:

·        Ph. D.  (1988),  Statistics and Probability, Michigan State University, East Lansing.

·        M. Stat. (1985),  Mathematical Statistics and Probability, Indian Statistical Institute, Calcutta.

·        B. Stat. (1983),  Statistics, Indian Statistical Institute, Calcutta.

ACADEMIC AND PROFESSIONAL POSITIONS HELD:

·        2005 (Summer) – present: Professor (tenured), Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, USA.

·        1998 (Fall) – 2005(Spring):  Professor, Department of Statistics, University of Georgia,  Athens, GA, USA.

·        1993 (Fall) - 1998 (Summer):  Associate Professor (tenured), Department of Statistics, University of  Georgia,  Athens, GA, USA.

·        1988 (Fall) - 1993 (Summer):  Assistant Professor, Department of Statistics, University of Georgia, Athens, GA, USA.

RESEARCH:

Ph. D. Dissertation Title:   "Asymptotically Optimal Bayes Compound and Empirical Bayes Estimators in Exponential Families with Compact Parameter Space" (Professor James F. Hannan, Ph. D. dissertation advisor).

Research Interest (past & present):   Biostatistics, Bioinformatics, Bootstrap Methods, Compound Decision, Analysis of Clustered Data, Clustering and Classification, Empirical Bayes, Genomics, Nonparametrics, Proteomics, Rank Tests, Survival Analysis, Time Series Analysis.

Clinical: Autism, Spinal Cord Injury.

PUBLICATIONS:

85. Pihur, V., Datta, S. and Datta, S. RankAggreg, an R package for weighted rank aggregation. Submitted.

 

84. Wang, M., Kong, M. and Datta, S. Inference for marginal linear models with clustered longitudinal data with potentially informative cluster sizes. Submitted.

 

83. Lan, L. and Datta, S.  Comparison of state occupation, entry, exit and waiting times in K independent multistate models under current status data. Submitted.

 

82. Datta, S., Lorenz, D. J., Morrison, S., Ardolino, E., Harkema, S. J. A multivariate examination of temporal changes in Berg variables for patients with AIS C and D spinal cord injuries. Archives of Physical Medicine and Rehabilitation, to appear (2008).

 

81. Datta, S., Bandyopadhyay, D. and Satten, G. A. Inverse probability of censoring weighted U-statistics for right censored data with applications. Submitted.

 

80. Datta, S., Lan, L. and Sundaram, R. Nonparametric estimation of waiting time distributions in a Markov model based on current status data. Submitted.

 

79. Lan, L. and Datta, S.  Nonparametric estimation of state occupation, entry and exit times with multistate current status data. Statistical Methods in Medical Research, doi:10.1177/0962280208094278 (2008).

 

78.  Pihur, V., Datta, S. and Datta, S. Finding cancer genes through meta-analysis of microarray experiments: Rank aggregation via the cross entropy algorithm. Genomics, doi:10.1016/j.ygeno.2008.05.003 (2008).

 

77. Pihur, V., Datta, S. and Datta, S. Reconstruction of genetic association networks from microarray data: A partial least squares approach. Bioinformatics, 24, 561-568 (2008). 

 

76. Datta, S. Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by Random Forest, Statistical Applications in Genetics and Molecular Biology, 7 (2), Article 7 (2008).

 

75. Brock, G., Pihur, V., Datta, S. and Datta, S. clValid , an R package for cluster validation. Journal of Statistical Software, 25, 4 (2008).

 

74. Pihur, V., Brock, G., Datta, S. and Datta, S. Cluster validation for microarray data: An appraisal. In Multivariate Statistical Methods, ( A. SenGupta, ed), ISI Platinum Jubilee series, World Scientific Press, to appear (2008).

 

73. Datta, S., Datta, S., Parrish, R. S.  and Thompson, C. M.  Microarray data analysis, In Computational Methods in Biomedical Research, (R. Khatree and D. Naik, eds.), Chapman & Hall/CRC Biostatistics Series, Volume 24, 1-43 (2008).

 

72.  Datta, S. and Satten, G. A. A signed-rank test for clustered data. Biometrics, 64, 501-507 (2008).

 

71. Bandyopadhyay, D. and Datta, S.  A novel approach to testing equality of survival distributions when the population marks are missing. Journal of Statistical Planning and Inference, 138, 1722-1732 (2008).

 

70. Pihur, V., Datta, S. and Datta, S. Understanding Chronic Fatigue Syndrome (CFS) from CAMDA data: A systems biology approach.  In CAMDA 2007 Proceedings,  online @ http://camda.bioinfo.cipf.es/camda07/agenda/detailed.html (2007).

 

69. Johnson, S. B., Datta, S., Hornung, C. A., Casanova, M. F. Mathematical models of epigenetic influences in Autism: a new perspective based on neuropathological findings.  In Progress in Autism Research, (Paul C. Carlisle, ed), Nova Science Publishers, Inc., 101-114,  New York: New York (2007).

 

68. Pihur, V., Datta, S. and Datta, S. Weighted rank aggregation of cluster validation measures: A Monte Carlo cross-entropy approach.  Bioinformatics, 23, 1607-1615 (2007).

 

67. Boratyn, G. M., Datta, S. and Datta, S. Incorporation of biological knowledge into distance for clustering genes, Bioinformation, 1, 396-405 (2007).

 

66.  Datta, S., Le-Rademacher, J. and Datta, S. Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO. Biometrics,  63, 259-271 (2007).

 

65. Zheng, H., Basawa, I. V. and Datta, S.  First order random coefficient autoregressive processes, Journal of Statistical Planning and Inference, 173, 212 – 229 (2007).

 

64. Datta, S., and Datta, S. Combining functional information in selecting clustering algorithms. In Proceedings of Interface 2005, on CD-ROM (2006).

 

63. Datta, S.  and Datta, S. Evaluation of clustering algorithms for gene expression data, BMC Bioinformatics,  7 (Suppl 4): S17, (2006).

 

62. Datta, S.  and Datta, S. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes, BMC Bioinformatics, 7:397 (2006). 

 

61. Boratyn, G. M., Datta, S. and Datta, S. Biologically supervised hierarchical clustering algorithms for gene expression data, In Proceedings of the 28th IEEE  EMBS Annual International Conference, New York City, USA, 5515-5518 (2006).

 

60. Zheng, H., Basawa, I. V. and Datta, S.  The p-th order random coefficient autoregressive processes, Journal of Time Series Analysis, 27, 411-440 (2006).

 

59. Datta, S.  and Sundaram, R. Nonparametric marginal estimation in a multistage  model using current status data, Biometrics, 62, 829–837 (2006).

 

58. Datta, S.  and Datta, S.  Validation measures for clustering algorithms incorporating biological information, In IEEE Proceedings of International Multi-Symposiums on Computer and Computional Sciences (IMSCCS|06), (J. Ni, J. Dongarra, Y. Zheng, G. Gu, G. Wolfgang and H. Jin, eds.), 1, 131-135 (2006). http://doi.ieeecomputersociety.org/10.1109/IMSCCS.2006.139

 

57. Datta, S.  Estimating the mean life time using right censored data. Statistical  Methodology, 2, 65-69 (2005).

 

56.  Datta, S.  and Datta, S.  Empirical Bayes screening (EBS) of many p-values with applications to microarray studies, Bioinformatics, 21,1987-1994 (2005).

 

55. Datta, S. and Satten, G. A. Rank-sum tests for clustered data, Journal of the American Statistical Association, 100, 908-915 (2005).

 

54. Datta, S. Bootstrapping,  In Encyclopedia of Statistical Sciences, Second edition, Wiley, (2005).

 

53. Datta, S. Empirical Bayes methods, In Encyclopedia of Statistical Sciences, Second edition, Wiley, (2005).

 

52. Satten, G. A., Datta, S., Moura, H., Woolfitt, A., Carvalho, G., De, B. K,  Pavlopoulos, A., Carlone, G. M., and Barr, J. Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens,  Bioinformatics, 20, 3128-3136 (2004). 

 

51. Datta, S.  and Datta, S. An empirical Bayes adjustment to multiple p-values for the detection of differentially expressed genes in microarray experiments. In APBC 2004, (Y-P. P. Chen, ed.), 29, 155-159 (2004).

 

50. Datta, S.,  Satten, G. A., Benos, D. J., Xia, J.,  Heslin, M., and Datta, S. An empirical Bayes adjustment to increase the sensitivity of detecting differentially expressed genes in microarray experiments, Bioinformatics,  20, 235-242 (2004).

 

49. Satten, G. A. and Datta, S. Marginal Analyses of Multistage Data. In Handbook of Statistics (N. Balakrishnan and C. R. Rao, eds.), 23,  559-574, Elsevier-North Holland (2004).

 

48.  Chakraborty, S. and Datta, S. How will plant pathogens adapt to host plant resistance at elevated CO2 under a changing climate? New Phytologist, 159, 733-742 (2003).

 

47. Datta, S. and Datta, S. Comparisons and validation of statistical clustering techniques for microarray gene expression data.  Bioinformatics, 19,  459-466 (2003).

 

46. Williamson, J., Datta, S., and Satten, G. A. Marginal analyses of clustered data when cluster size is informative. Biometrics, 59, 36-42 (2003).

 

45. Datta, S. and Satten, G. A. Estimation of integrated transition hazards and stage occupation probabilities for non-Markov systems under stage dependent censoring. Biometrics, 58, 792-802 (2002).

 

44. Satten, G. A. and Datta, S.  Marginal estimation for Multistage models: waiting time distributions and competing risk analyses. Statistics in Medicine, 21, 3-19 (2002).

 

43. Datta, S. and Satten, G. A. Validity of the Aalen-Johansen estimators of stage occupation probabilities and integrated transition hazards for non-Markov models.  Statistics and Probability Letters, 55, 403-411 (2001).

 

42. Satten, G. A., Datta, S. and Robins, J. M. An estimator for the survival function when data are subject to dependent censoring.  Statistics and Probability Letters, 54, 397-403 (2001).

 

41. Satten, G. A. and Datta, S. The Kaplan-Meier Estimator as an inverse-probability-of-censoring weighted average. American Statistician, 55, 207-210 (2001).

 

40. Williamson, J. M., Satten, G. A., Hanson, J. A., Weinstock, H., and Datta, S. Analysis of dynamic cohort data. American Journal of Epidemiology, 154, 366-372 (2001).

 

39. Li, G. and Datta, S.  A bootstrap approach to nonparametric regression for right censored data. Annals of the Institute of Statistical Mathematics, 53, 708-729 (2001).

 

38. Datta, S. and Satten, G. A.  Estimating future stage entry and occupation probabilities in a multistage model based on randomly right-censored data. Statistics and Probability Letters, 50, 89-95 (2000).

 

37. Satten, G. A. and Datta, S.  A simulate-update algorithm for missing data problems. Computational Statistics, 15, 243-277 (2000).

 

36. Datta, S.  Empirical Bayes estimation with non-identical components.  Journal of Nonparametric Statistics, 12, 709-725 (2000).

 

35. Datta, S., Satten, G. A. and Datta, S. Nonparametric estimation for the three-stage irreversible illness-death model. Biometrics, 56, 841-847 (2000).

 

34. Datta, S.,  Satten, G. A. and  Williamson, J. M.  Consistency and asymptotic normality of estimators in a regression model with interval censoring and left truncation.  Annals of the Institute of Statistical Mathematics, 52, 160-172 (2000).

 

33. Datta, S., Satten, G. A. and Datta, S.  Estimation of stage occupation probabilities in multistage models. In Advances on Theoretical and Methodological Aspects of Probability and Statistics, (N. Balakrishnan, ed.), 493-506 (2000), Gordon and Breach.

 

32. Satten, G. A., Janssen, R., Busch, M. P., and Datta, S. Validating marker-based incidence estimates in repeatedly screened population. Biometrics, 55, 1224-1227 (1999).

 

31. Allen, M. R. and Datta, S.  Estimation of the index parameter for autoregressive data using the estimated innovations.  Statistics and Probability Letters, 41, 315-324 (1999).

 

30. Satten, G. A. and Datta, S.  Kaplan-Meier representation of competing risk estimates. Statistics and Probability Letters, 42, 299-304 (1999).

 

29. Allen, M. and Datta, S. A note on bootstrapping M-estimators in ARMA models. Journal of Time Series Analysis, 20, 365-380 (1999).

 

28. Bagui S. C. and Datta, S.  Some useful properties of the Bayes risk in classification. Calcutta Statistical Association Bulletin, 48, 83-91 (1998).

 

27. Datta, S., Mathew G. and McCormick, W. P. Nonlinear autoregression with positive innovations. Australian & New Zealand Journal of Statistics, 40, 229-239 (1998).

 

26. Satten, G. A. and Datta, S. and Williamson, J. M. A semiparametric approach to the proportional hazards model for interval censored data. Journal of the American Statistical Association, 93, 318-327 (1998).

 

25. Datta, S. and McCormick, W. P. Inference for the tail parameters of a linear process with heavy tailed innovations.  Annals of the Institute of Statistical Mathematics,  50, 337-359 (1998).

 

24. Datta, S. Making the bootstrap work.  In Frontiers in Probability and Statistics, (S. P. Mukherjee, S. K.Basu and B. K. Sinha, eds), Nasora Publishing, 119-129 (1998), Narosa, New Delhi.

 

23. Datta, S. and Hannan, J. F. A uniform L1 law of large numbers for functions on a totally bounded metric space. Sankhya A,  59, 167-174 (1997).

 

22.  Datta, S.  L1 density estimation for linear processes.  Journal of Time Series Analysis,  18, 375-383 (1997).

 

21. Datta, S. and Sriram, T. N. A modified bootstrap for autoregression without stationarity.  Journal of Statistical Planning and Inference,  59, 19-30 (1997).

 

20. Datta, S. On asymptotic properties of bootstrap for AR(1) processes. Journal of Statistical Planning and Inference,  53, 361-374 (1996).

 

19. Datta, S. and McCormick, W. P. Bootstrap inference for a first order autoregression with positive innovations. Journal of American Statistical Association, 90, 1289-1300 (1995).

 

18. Datta, S.  Limit theory and bootstrap for explosive and partially explosive autoregression. Stochastic Processes and Their Applications,  57, 285-304 (1995).

 

17. Datta, S. and Sriram, T. N. A modified bootstrap for branching processes with immigration. Stochastic Processes and Their Applications,  56, 275-294 (1995).

 

16. Datta, S. On a modified bootstrap for certain asymptotically non-normal statistics. Statistics and Probability Letters, 24, 91-98 (1995).

 

15. Datta, S. A minimax optimal estimator for continuous monotone densities. Journal of Statistical Planning and Inference, 46, 181-193 (1995).

 

14. Datta, S. Consistency of the mle for a general sequential design problem. Sankhya A, 57, 88-99 (1995).

 

13. Datta, S. and McCormick, W. P. Some continuous Edgeworth expansions for Markov chains with applications to bootstrap. Journal of Multivariate Analysis, 52, 83-106 (1995).

 

12. Datta, S.  Empirical Bayes estimation in a threshold model. Sankhya A, 54, 106-117  (1994).

 

11. Basawa, I. V. and Datta, S.  Large sample estimation for nested models. Journal of the Indian Society of Probability and Statistics, 1, 19-42 (1994).

 

10. Datta, S.  A solution to the set compound problem with certain non regular components. Statistics and Decisions, 11, 343-355 (1993).

 

9. Datta, S. and McCormick, W. P. Regeneration based bootstrap for Markov chains. Canadian Journal of Statistics, 21, 181-193 (1993).

 

8. Datta, S. and McCormick, W. P. On first order Edgeworth expansions for a Markov chain. Journal of Multivariate Analysis, 44, 345-359 (1993).

 

7. Datta, S.  Some non asymptotic bounds for L1 density estimation using kernels. Annals of Statistics, 20, 1658-1667 (1992).

 

6. Bhat, B. R. and Datta, S.  On the completeness of a family of conditional distributions. Statistics and Probability Letters, 15, 27-30 (1992).

 

5. Datta, S.  A note on continuous Edgeworth expansions and the bootstrap. Sankhya A,  54, 171-182 (1992).

 

4. Datta, S.  and McCormick, W. P. Bootstrap for a finite state Markov chain based on i.i.d. resampling. In Exploring the Limits of Bootstrap, (L. LePage and L. Billard, eds), 77-97 (1992), Wiley, New York.

 

3. Datta, S.  Nonparametric empirical Bayes estimation with O(n-1/2) rate of a truncation parameter. Statistics and Decisions, 9, 45-61 (1991).

 

2. Datta, S.  Asymptotic optimality of Bayes compound estimators in compact exponential families. Annals of Statistics, 19, 354-365 (1991).

 

1. Datta, S.  On the consistency of posterior mixtures and its application. Annals of Statistics, 19, 338-353 (1991).

EXTERNAL RESEARCH FUNDING (last 10 years):

  AWARDS/HONORS:

  PROFESSIONAL ACTIVITIES:

 TEACHING:

At University of Georgia (1988—2005):

 

 

          At University of Louisville (2005 - current):

 

GRADUATE STUDENTS:

PhD

·        Michael R. Allen, Inference and Bootstrap for Some Linear Time Series Models.  Completed: Summer 1997. Currently at Department of Mathematics, Tennessee Technological University, Cookeville, TN 38505.

 

·        S. Kim (jointly with I. V. Basawa), Inference for Nonlinear Time Series Models via Estimating Functions. Completed: Spring 1998.

 

·        HaiTao Zheng (jointly with I. V. Basawa), Inference for Time Series Models for Count Data. Completed: Summer 2005.

 

·        Dipankar Bandopadhyay,  Novel Nonparametric Methods for Event Time Data. Completed: Spring 2006. Currently at Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC 29425.

 

·        DeSale Habtzghi (jointly with M. Meyer), Maximum Likelihood Based Estimation of Hazard Function under Shape Restrictions and Related Statistical Inference. Completed: Spring 2006. Currently at Department of Statistics, University of Akron, Akron, OH 44325.

 

·        Lan Ling, Inference for Multistate Models.  Completed: Summer 2008. Currently at Department of Biostatistics and Epidemiology, Medical College of Georgia, Augusta, GA 30912.

 

·        Vasyl Pihur (jointly with Susmita Datta), expected completion: Summer 2010.

 

·        Doug Lorenz (jointly with R. Gill), expected completion: Summer 2010.

 

·        Jie Fan, expected completion: Summer 2010.

 

MS

·   &