Index of /datasets/mnist
.. _datadb_mnist:
*********************************************************************
LeCun et al. (1999): The MNIST Dataset Of Handwritten Digits (Images)
*********************************************************************
The MNIST_ dataset of handwritten digits, available from this page, has a
training set of 60,000 examples, and a test set of 10,000 examples. It is a
subset of a larger set available from NIST. The digits have been
size-normalized and centered in a fixed-size image. It is a good database for
people who want to try learning techniques and pattern recognition methods on
real-world data while spending minimal efforts on preprocessing and formatting.
See http://yann.lecun.com/exdb/mnist for more information.
.. note::
The version that is offered here is identical to the four files distributed
there, but has been converted into a single HDF5 file than can easily be read
by PyMVPA.
Terms Of Use
============
`Yann LeCun`_ (Courant Institute, NYU) and `Corinna Cortes`_ (Google
Labs, New York) hold the copyright of MNIST_ dataset, which is a
derivative work from original NIST datasets. MNIST_ dataset is made
available under the terms of the `Creative Commons Attribution-Share
Alike 3.0`_ license.
.. _MNIST: http://yann.lecun.com/exdb/mnist
.. _Creative Commons Attribution-Share Alike 3.0: http://creativecommons.org/licenses/by-sa/3.0/
.. _Yann LeCun: http://yann.lecun.com/
.. _Corinna Cortes: http://web.me.com/corinnacortes/work/Home.html
Download
========
A single hdf5 file containing entire MNIST_ dataset is available from
http://data.pymvpa.org/datasets/mnist/
Requirements
============
* HDF5 access facility.
* *PyMVPA 0.5* (or later) provides the `h5load()` function (utilizes H5PY_ package).
.. _H5PY: http://h5py.alfven.org/
Instructions
============
>>> from mvpa2.suite import *
>>> filepath = os.path.join(pymvpa_datadbroot, 'mnist', "mnist.hdf5")
>>> datasets = h5load(filepath)
>>> train = datasets['train']
>>> test = datasets['test']
>>> print train
<Dataset: 60000x784@uint8, <sa: labels>>
>>> print test
<Dataset: 10000x784@uint8, <sa: labels>>
>>> # assign a mapper able to recreate 28x28 pixel image arrays
>>> test.a.mapper = FlattenMapper(shape=(28, 28))
>>> test.mapper.reverse(test).shape
(10000, 28, 28)
References
==========
:ref:`LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998) <LBB+98>`.
Gradient-based learning applied to document recognition.
Proceedings of the IEEE, 86, 2278--2324.