I am interested in the synergy between statistical text analytics
and knowledge based reasoning and, particularly, in investigating the use of
light-weight semantics in combination
with machine learning techniques for scalable and automated information retrieval.
Projects in Siemens
Power Semantics: Analyzing various unstructured text service reports in Siemens Power Generation for more efficient tracking, monitoring, and searching service related activities.
Data Cleansing: Reconciling inconsistencies between various data sources using a combination of learning techniques and rules.
Spend Analysis: Automatic classification of material master descriptions of spend transactions to commodity code hierarchies.
Theseus-Medico: Developing technologies for bringing Semantic Web concepts of ontologies, metadata and their representation in RDF, RDFS, and OWL to medical imaging for semantic image understanding and retrieval.
Selected Publications:
Classifying Spend Transactions with Off-the-Shelf Learning Components -
Saikat Mukherjee, Dmitriy Fradkin, Michael Roth in
IEEE International Conference on Tools in Artifical Intelligence (ICTAI)' 2008
Medical Image Understanding through the Integration of Cross-Modal Object Recognition with Formal Domain Knowledge -
Manuel Moeller, Michael Sintek, Paul Buitelaar, Saikat Mukherjee, Xiang Sean Zhou, Joerg Freund in
International Conference on Health Informatics (HEALTHINF) 2008
[pdf]
Phd dissertation
In my PhD dissertation,
Automated Semantic Analysis of Schematic Data ,
I worked on the semantic understanding
of template-generated semi-structured and unstructured data sources.
By coupling machine learning and domain knowledge I developed
highly automated and scalable solutions to this problem.
I have applied my dissertation ideas to relevant problems in
assistive technologies, mobile devices browsing, Web transactions, and
information extraction.
Selected Publications:
Automated Semantic Analysis of Schematic Data -
Saikat Mukherjee, I.V. Ramakrishnan in World Wide Web Journal (Springer), online first 2008
[pdf]
Bootstrapping Semantic Annotation for Content-Rich HTML Documents -
Saikat Mukherjee, I.V. Ramakrishnan, Amarjeet Singh in
International Conference on Data Engineering (ICDE)' 2005
[pdf]
Automatic Annotation of Content-Rich Web Documents: Structural and Semantic Analysis -
Saikat Mukherjee, Guizhen Yang, I.V. Ramakrishnan in
International Semantic Web Conference (ISWC)' 2003
[pdf]