• YearPredictionMSD

    Prediction of the release year of a song from audio features. Songs are mostly western, commercial tracks ranging from 1922 to 2011, with a peak in the year 2000s.
  • Geographical Original of Music

    Instances in this dataset contain audio features extracted from 1059 wave files. The task associated with the data is to predict the geographical origin of music.
  • TV News Channel Commercial Detection Dataset

    TV Commercials data set consists of standard audio-visual features of video shots extracted from 150 hours of TV news broadcast of 3 Indian and 2 international news channels (...
  • FMA: A Dataset For Music Analysis

    FMA features 106,574 tracks and includes song title, album, artist, genres; play counts, favorites, comments; description, biography, tags; together with audio (343 days, 917...
  • LSVT Voice Rehabilitation

    126 samples from 14 participants, 309 features. Aim
  • Iris

    Famous database; from Fisher, 1936
  • SML2010

    This dataset is collected from a monitor system mounted in a domotic house. It corresponds to approximately 40 days of monitoring data.
  • Relative location of CT slices on axial axis

    The dataset consists of 384 features extracted from CT images. The class variable is numeric and denotes the relative location of the CT slice on the axial axis of the human body.
  • UJI Pen Characters

    Data consists of written characters in a UNIPEN-like format
  • Student Performance

    Predict student performance in secondary education (high school).
  • Dresses_Attribute_Sales

    This dataset contain Attributes of dresses and their recommendations according to their sales.Sales are monitor on the basis of alternate days.
  • News Aggregator

    References to news pages collected from an web aggregator in the period from 10-March-2014 to 10-August-2014. The resources are grouped into clusters that represent pages...
  • Anonymous Microsoft Web Data

    Log of anonymous users of www.microsoft.com; predict areas of the web site a user visited based on data on other areas the user visited.
  • Syskill and Webert Web Page Ratings

    This database contains HTML source of web pages plus the ratings of a single user on these web pages. Web pages are on four seperate subjects (Bands- recording artists; Goats;...
  • Gisette

    GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusible digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003...
  • AAAI 2014 Accepted Papers

    This data set compromises the metadata for the 2014 AAAI conference's accepted papers, including paper titles, authors, abstracts, and keywords of varying granularity.
  • EEG Eye State

    The data set consists of 14 EEG values and a value indicating the eye state.
  • Steel Plates Faults

    A dataset of steel plates’ faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition.
  • Kinship

    Relational dataset
  • Facebook metrics

    Facebook performance metrics of a renowned cosmetic's brand Facebook page.
  • Page Blocks Classification

    The problem consists of classifying all the blocks of the page layout of a document that has been detected by a segmentation process.
  • Amazon Commerce reviews set

    The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition.
  • QtyT40I10D100K

    Since there is no numerical sequential data stream available in standard data sets, this data set is generated from the original T40I10D100K data set
  • Health News in Twitter

    The data was collected in 2015 using Twitter API. This dataset contains health news from more than 15 major health news agencies such as BBC, CNN, and NYT.
  • seeds

    Measurements of geometrical properties of kernels belonging to three different varieties of wheat. A soft X-ray technique and GRAINS package were used to construct all seven,...

  • SIFT10M

    In SIFT10M, each data point is a SIFT feature which is extracted from Caltech-256 by the open source VLFeat library. The corresponding patches of the SIFT features are provided.
  • Gas sensor array under dynamic gas mixtures

    The data set contains the recordings of 16 chemical sensors exposed to two dynamic gas mixtures at varying concentrations. For each mixture, signals were acquired continuously...
  • Repeat Consumption Matrices

    The dataset contains 7 datasets of User - Item matrices, where each entry represents how many times a user consumed an item. Item is used as an umbrella term for various...
  • Thyroid Disease

    10 separate databases from Garavan Institute
  • Reuter_50_50

    The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition.
  • Connectionist Bench (Sonar, Mines vs. Rocks)

    The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock.
  • HIV-1 protease cleavage

    The data contains lists of octamers (8 amino acids) and a flag (-1 or 1) depending on whether HIV-1 protease will cleave in the central position (between amino acids 4 and 5).

    Data sets includes returns of Istanbul Stock Exchange with seven other international index; SP, DAX, FTSE, NIKKEI, BOVESPA, MSCE_EU, MSCI_EM from Jun 5, 2009 to Feb 22, 2011.
  • ICU

    Data set prepared for the use of participants for the 1994 AAAI Spring Symposium on Artificial Intelligence in Medicine.
  • Activity Recognition from Single Chest-Mounted Accelerometer

    The dataset collects data from a wearable accelerometer mounted on the chest. The dataset is intended for Activity Recognition research purposes.
  • Car Evaluation

    Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.
  • Flags

    From Collins Gem Guide to Flags, 1986
  • Weight Lifting Exercises monitored with Inertial Measurement Units

    Six young health subjects were asked to perform 5 variations of the biceps curl weight lifting exercise. One of the variations is the one predicted by the health professional.
  • Mammographic Mass

    Discrimination of benign and malignant mammographic masses based on BI-RADS attributes and the patient's age.
  • Annealing

    Steel annealing data
  • Reuters-21578 Text Categorization Collection

    This is a collection of documents that appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories.
  • Mturk User-Perceived Clusters over Images

    This dataset was collected by Shan-Hung Wu and DataLab members at NTHU, Taiwan. There're 325 user-perceived clusters from 100 users and their corresponding descriptions.
  • Tamilnadu Electricity Board Hourly Readings

    This data can be effectively produced the result to fewer parameter of the Load profile can be reduced in the Database
  • Ecoli

    This data contains protein localization sites
  • Student Loan Relational

    Student Loan Relational Domain
  • YouTube Multiview Video Games Dataset

    This dataset contains about 120k instances, each described by 13 feature types, with class information, specially useful for exploring multiview topics (cotraining, ensembles,...
  • Forest type mapping

    Multi-temporal remote sensing data of a forested area in Japan. The goal is to map different forest types using spectral data.
  • Chronic_Kidney_Disease

    This dataset can be used to predict the chronic kidney disease and it can be collected from the hospital nearly 2 months of period.