Dynamic Time Warping (DTW) is a dynamic programming
algorithm that is known to be one of the best methods
to measure the similarities between two signals, even if there are
variations in the speed of those. It is extensively used in many
machine learning algorithms, especially for pattern recognition
and classification. U nfortunately, i t h as a q uadratic complexity,
which results in very high computational costs. Furthermore,
its data dependency made it also very difficult t o parallelize.
Special attention has been paid to computing DTW on the edge,
as a way to reduce the load of communication on Internet-of-
Thing applications. In this work, we propose a minimum area
implementation of the DTW algorithm in AMD FPGAs with
optimal use of the resources. That is achieved by maximizing
the use time of the resources and taking advantage of the inner
structure of the AMD FPGAs. This architecture could be used in
small devices or as a base for a multi-core implementation with
very high-throughput.