Shaky video footage of ordinary day-to-day things - a doorway, a boat docked in a canal, bicycles - is shown during a man's walk down an Amsterdam street. In the top left hand corner of the screen, text appears describing the sidewalk sights.
The text in the video by U.S. artist and coder Kyle McDonald was generated in real time by a neural network. McDonald's network -- which was based off a system called NeuralTalk developed by Stanford Ph.D. student Andrej Karpathy -- analyzes live webcam footage from the laptop and then transcribes what it's seeing in text form.
Some of the computer system's descriptions are more accurate than others. For instance, the network described a man wearing a baseball hat and eating a hot dog as "a man in a suit and tie holding a drink."
Despite the errors it makes, this video-based system is part of a wave of new computer software that can interpret images with near-human accuracy.
What is one example that would be familiar to most people? Whenever you upload a photo to Facebook, facial recognition software helps you tag your friends.