• 1. Text Classification and Classifiers: A Survey

    Vandana Korde, C Namrata Mahender and Sardar Vallabhbhai
    March 2012 | Cited by 147

    As most information (over 80%) is stored as text, text mining is believed to have a high commercial potential value. knowledge may be discovered from many sources of information; yet, unstructured texts remain the largest readily available source of knowledge.Text classification which classifies the documents according to predefined categories. In this paper we are tried to give the introduction of text classification, process of text classification as well as the overview of the classifiers and tried to compare the some existing classifier on basis of few criteria like time complexity, principal and performance.
  • 2. A Machine Learning Approach for Opinion Holder Extraction in Arabic Language

    Mohamed Elarnaoty, Samir AbdelRahman and Aly Fahmy
    March 2012 | Cited by 75

    Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures.Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques.
  • 3. Mining Frequent Itemsets Using Genetic Algorithm

    Soumadip Ghosh, Sushanta Biswas, Debasree Sarkar and Partha Pratim Sarkar
    October 2010 | Cited by 70

    In general frequent itemsets are generated from large data sets by applying association rule mining algorithms like Apriori, Partition, Pincer-Search, Incremental, Border algorithm etc., which take too much computer time to compute all the frequent itemsets. By using Genetic Algorithm (GA) we can improve the scenario.The major advantage of using GA in the discovery of frequent itemsets is that they perform global search and its time complexity is less compared to other algorithms as the genetic algorithm is based on the greedy approach. The main aim of this paper is to find all the frequent itemsets from given data sets using genetic algorithm.
  • 4. Genetic K-Means Clustering Algorithm for Mixed Numeric and Categorical Data Sets

    Dharmendra K Roy and Lokesh K Sharma
    April 2010 | Cited by 65

    Clustering is one of the major data mining tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. In this paper we present a clustering algorithm based on Genetic k-means paradigm that works well for data with mixed numeric and categorical features. We propose a modified description of cluster center to overcome the numeric data only limitation of Genetic k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on benchmark data sets.
  • 5. Predicting Learners Performance Using Artificial Neural Networks in Linear Programming Intelligent Tutoring System

    Samy S. Abu Naser
    March 2012 | Cited by 53

    In this paper we present a technique that employ Artificial Neural Networks and expert systems to obtain knowledge for the learner model in the Linear Programming Intelligent Tutoring System (LP-ITS) to be able to determine the academic performance level of the learners in order to offer him/her the proper difficulty level of linear programming problems to solve. LP-ITS uses Feed forward Back-propagation algorithm to be trained with a group of learners data to predict their academic performance. Furthermore, LP-ITS uses an Expert System to decide the proper difficulty level that is suitable with the predicted academic performance of the learner. Several tests have been carried out to examine adherence to real time data. The accuracy of predicting the performance of the learners is very high and thus states that the Artificial Neural Network is skilled enough to make.
  • 6. An Efficient Automatic Mass Classification Method in Digitized Mammograms Using Artificial Neural Network

    Mohammed J. Islam, Majid Ahmadi and Maher A. Sid-Ahmed
    July 2010 | Cited by 51

    In this paper we present an efficient computer aided mass classification method in digitized mammograms using Artificial Neural Network (ANN), which performs benign-malignant classification on region of interest (ROI) that contains mass. One of the major mammographic characteristics for mass classification is texture. ANN exploits this important factor to classify the mass into benign or malignant. The statistical textural features used in characterizing the masses are mean, standard deviation, entropy, skewness, kurtosis and uniformity. The main aim of the method is to increase the effectiveness and efficiency of the classification process in an objective manner to reduce the numbers of false-positive of malignancies. Three layers artificial neural network (ANN) with seven features was proposed for classifying the marked regions into benign and malignant and 90.91.
  • 7. Comparison of Support Vector Machine and Back Propagation Neural Network in Evaluating the Enterprise Financial Distress

    Ming-Chang Lee and Chang To
    July 2010 | Cited by 49

    Recently, applying the novel data mining techniques for evaluating enterprise financial distress has received much research alternation. Support Vector Machine (SVM) and back propagation neural (BPN) network has been applied successfully in many areas with excellent generalization results, such as rule extraction, classification and evaluation. In this paper, a model based on SVM with Gaussian RBF kernel is proposed here for enterprise financial distress evaluation. BPN network is considered one of the simplest and are most general methods used for supervised training of multilayered neural network. The comparative results show that through the difference between the performance measures is marginal; SVM gives higher precision and lower error rates.
  • 8. Optimizing Face Recognition Using PCA

    Manal Abdullah, Majda Wazzan and Sahar Bo-saeed
    March 2012 | Cited by 47

    Principle Component Analysis PCA is a classical feature extraction and data representation technique widely used in pattern recognition. It is one of the most successful techniques in face recognition. But it has drawback of high computational especially for big size database. This paper conducts a study to optimize the time complexity of PCA (eigenfaces) that does not affects the recognition performance. The authors minimize the participated eigenvectors which consequently decreases the computational time. A comparison is done to compare the differences between the recognition time in the original algorithm and in the enhanced algorithm. The performance of the original and the enhanced proposed algorithm is tested on face94 face database. Experimental results show that the recognition time is reduced by 35% by applying our proposed enhanced algorithm.
  • 9. A Framework for Intelligent Medical Diagnosis Using Rough Set With Formal Concept Analysis

    B.K.Tripathy, D.P.Acharjya and V.Cynthya
    April 2011 | Cited by 42

    Medical diagnosis process vary in the degree to which they attempt to deal with different complicating aspects of diagnosis such as relative importance of symptoms, varied symptom pattern and the relation between diseases themselves. Based on decision theory, in the past many mathematical models such as crisp set, probability distribution, fuzzy set, intuitionistic fuzzy set were developed to deal with complicating aspects of diagnosis. But, many such models are failed to include important aspects of the expert decisions. Therefore, an effort has been made to process inconsistencies in data being considered by Pawlak with the introduction of rough set theory. Though rough set has major advantages over the other methods, but it generates too many rules that create many difficulties while taking decisions. Therefore, it is essential to minimize the decision rules. In this paper, we use two processes such as preprocess and post process to mine suitable rules and to explore the relationship among the attributes. In pre process we use rough set theory to mine suitable rules, whereas in post process we use formal concept analysis from these suitable rules to explore better knowledge and most important factors affecting the decision making.
  • 10. Hiding Sensitive Association Rules Without Altering the Support of Sensitive Item(S)

    Dhyanendra Jain, Amit sinhal, Neetesh Gupta, Priusha Narwariya, Deepika Saraswat and Amit Pandey
    March 2012 | Cited by 31

    Association rule mining is an important data-mining technique that finds interesting association among a large set of data items. Since it may disclose patterns and various kinds of sensitive knowledge that are difficult to find otherwise, it may pose a threat to the privacy of discovered confidential information. Such information is to be protected against unauthorized access. Many strategies had been proposed to hide the information. Some use distributed databases over several sites, data perturbation, clustering, and data distortion techniques. Hiding sensitive rules problem, and still not sufficiently investigated, is the requirement to balance the confidentiality of the disclosed data with the legitimate needs of the user. The proposed approach uses the data distortion technique where the position of the sensitive items is altered but its support is never changed