資料集

  • gene expression cancer RNA-Seq

    更新頻率 不定期
    This collection of data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor
  • NSF Research Award Abstracts 1990-2003

    更新頻率 不定期
    This data set consists of (a) 129,000 abstracts describing NSF awards for basic research, (b) bag-of-word data files extracted from the abstracts, (c) a list of words used for...
  • CalIt2 Building People Counts

    更新頻率 不定期
    This data comes from the main door of the CalIt2 building at UCI.
  • Activity recognition with healthy older people using a batteryless wearable s...

    更新頻率 不定期
    Sequential motion data from 14 healthy older people aged 66 to 86 years old using a batteryless, wearable sensor on top of their clothing for the recognition of activities in...
  • Pioneer-1 Mobile Robot Data

    更新頻率 不定期
    This dataset contains time series sensor readings of the Pioneer-1 mobile robot. The data is broken into "experiences" in which the robot takes action for some period of time...
  • BLOGGER

    更新頻率 不定期
    In this paper, we look for to recognize the causes of users tend to cyber space in Kohkiloye and Boyer Ahmad Province in Iran
  • Document Understanding

    更新頻率 不定期
    Five concepts, expressed as predicates, to be learned
  • Wearable Computing: Classification of Body Postures and Movements (PUC-Rio)

    更新頻率 不定期
    A dataset with 5 classes (sitting-down, standing-up, standing, walking, and sitting) collected on 8 hours of activities of 4 healthy subjects. We also established a baseline...
  • Discrete Tone Image Dataset

    更新頻率 不定期
    Discrete Tone Images(DTI)are available which needs to be analyzed in detail. Here, we created this dataset for those who do research in DTI.
  • UJIIndoorLoc

    更新頻率 不定期
    The UJIIndoorLoc is a Multi-Building Multi-Floor indoor localization database to test Indoor Positioning System that rely on WLAN/WiFi fingerprint.
  • Spambase

    更新頻率 不定期
    Classifying Email as Spam or Non-Spam
  • Undocumented

    更新頻率 不定期
    Various datasets without documentation (feel free to explore!)
  • OCT data & Color Fundus Images of Left & Right Eyes

    更新頻率 不定期
    This dataset contains OCT data (in mat format) and color fundus data (in jpg format) of left & right eyes of 50 healthy persons.
  • Activity Recognition system based on Multisensor data fusion (AReM)

    更新頻率 不定期
    This dataset contains temporal data from a Wireless Sensor Network worn by an actor performing the activities
  • Cardiotocography

    更新頻率 不定期
    The dataset consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians.
  • Polish companies bankruptcy data

    更新頻率 不定期
    The dataset is about bankruptcy prediction of Polish companies.The bankrupt companies were analyzed in the period 2000-2012, while the still operating companies were evaluated...
  • Multiple Features

    更新頻率 不定期
    This dataset consists of features of handwritten numerals (0'--9') extracted from a collection of Dutch utility maps
  • Dorothea

    更新頻率 不定期
    DOROTHEA is a drug discovery dataset. Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one...
  • Climate Model Simulation Crashes

    更新頻率 不定期
    Given Latin hypercube samples of 18 climate model input parameter values, predict climate model simulation crashes and determine the parameter value combinations that cause the...
  • IPUMS Census Database

    更新頻率 不定期
    This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990.
  • Sentiment Labelled Sentences

    更新頻率 不定期
    The dataset contains sentences labelled with positive or negative sentiment.
  • DrivFace

    更新頻率 不定期
    The DrivFace contains images sequences of subjects while driving in real scenarios. It is composed of 606 samples of 640×480, acquired over different days from 4 drivers with...
  • Gas sensor array under flow modulation

    更新頻率 不定期
    The data set contains 58 time series acquired from 16 chemical sensors under gas flow modulation conditions. The sensors were exposed to different gaseous binary mixtures of...
  • Appliances energy prediction

    更新頻率 不定期
    Experimental data used to create regression models of appliances energy use in a low energy building.
  • Nomao

    更新頻率 不定期
    Nomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place. Instances in the...
  • Dishonest Internet users Dataset

    更新頻率 不定期
    The dataset was used to test an architecture based on a trust model capable to cope with the evaluation of the trustworthiness of users interacting in pervasive environments.
  • MiniBooNE particle identification

    更新頻率 不定期
    This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background).
  • BLE RSSI Dataset for Indoor localization and Navigation

    更新頻率 不定期
    This dataset contains RSSI readings gathered from an array of Bluetooth Low Energy (BLE) iBeacons in a real-world and operational indoor environment for localization and...
  • Soybean (Small)

    更新頻率 不定期
    Michalski's famous soybean disease database
  • Predict keywords activities in a online social media

    更新頻率 不定期
    The data from Twitter was collected during 360 consecutive days. It was done by querying 1497 English keywords sampled from Wikipedia. This dataset is proposed in a Learning to...
  • Arcene

    更新頻率 不定期
    ARCENE's task is to distinguish cancer versus normal patterns from mass-spectrometric data. This is a two-class classification problem with continuous input variables. This...
  • Daily Demand Forecasting Orders

    更新頻率 不定期
    The dataset was collected during 60 days, this is a real database of a brazilian logistics company.
  • Forest Fires

    更新頻率 不定期
    This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data...
  • Condition monitoring of hydraulic systems

    更新頻率 不定期
    The data set addresses the condition assessment of a hydraulic test rig based on multi sensor data. Four fault types are superimposed with several severity grades impeding...
  • Diabetes

    更新頻率 不定期
    This diabetes dataset is from AIM '94
  • Hayes-Roth

    更新頻率 不定期
    Topic
  • Open University Learning Analytics dataset

    更新頻率 不定期
    Open University Learning Analytics Dataset contains data about courses, students and their interactions with Virtual Learning Environment for seven selected courses and more...
  • Gesture Phase Segmentation

    更新頻率 不定期
    The dataset is composed by features extracted from 7 videos with people gesticulating, aiming at studying Gesture Phase Segmentation. It contains 50 attributes divided into two...
  • DBWorld e-mails

    更新頻率 不定期
    It contains 64 e-mails which I have manually collected from DBWorld mailing list. They are classified in
  • Twin gas sensor arrays

    更新頻率 不定期
    5 replicates of an 8-MOX gas sensor array were exposed to different gas conditions (4 volatiles at 10 concentration levels each).
  • Ultrasonic flowmeter diagnostics

    更新頻率 不定期
    Fault diagnosis of four liquid ultrasonic flowmeters
  • Physicochemical Properties of Protein Tertiary Structure

    更新頻率 不定期
    This is a data set of Physicochemical Properties of Protein Tertiary Structure. The data set is taken from CASP 5-9. There are 45730 decoys and size varying from 0 to 21 armstrong.
  • Blood Transfusion Service Center

    更新頻率 不定期
    Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem.
  • EMG dataset in Lower Limb

    更新頻率 不定期
    3 different exercises
  • Activities of Daily Living (ADLs) Recognition Using Binary Sensors

    更新頻率 不定期
    This dataset comprises information regarding the ADLs performed by two users on a daily basis in their own homes.
  • Tennis Major Tournament Match Statistics

    更新頻率 不定期
    This is a collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a...
  • Parkinson Disease Spiral Drawings Using Digitized Graphics Tablet

    更新頻率 不定期
    Handwriting database consists of 62 PWP(People with Parkinson) and 15 healthy individuals. Three types of recordings (Static Spiral Test, Dynamic Spiral Test and Stability Test)...
  • PubChem Bioassay Data

    更新頻率 不定期
    These highly imbalanced bioassay datasets are from the differing types of screening that can be performed using HTS technology. 21 datasets were created from 12 bioassays.
  • Auto MPG

    更新頻率 不定期
    Revised from CMU StatLib library, data concerns city-cycle fuel consumption
  • Function Finding

    更新頻率 不定期
    Cases collected mostly from investigations in physical science; intention is to evaluate function-finding algorithms