Reconstructing the World With Flickr

As of September 2010, Flicker hosted 5 billion images with 3,000 uploaded each minute.  Facebook hosts even more.

These massive photo sharing systems provide an ever growing imagery dataset that could be used to reconstruct real world scenes in 3D.  In recent years, researchers have been processing images from these sites using computer vision techniques to reconstruct real world scenes in 3D. Read the rest of this entry »

Digital Media and Technology Trends for 2011

In 2010, we witnessed a number of important developments in the world of digital media and technology.  The iPad became the first commercially successful tablet, a new type of computer security threat appeared, and Facebook grew to over 500 million users, just to name a few.   As 2011 arrives, here are trends I will be keeping an eye on.

  • Apps in the Living Room: An increasing number of televisions are connected to the internet.  Many of these have pre-installed applications to access YouTube, Netflix, Pandora, and other popular services.  Internet connected television platforms from Boxee, Google, Yahoo,  LG, and others make it possible for viewers to install additional applications by browsing app directories or bookmarking TV optimized websites.   It is an open question of which “lean back” apps will prove popular with consumers.  In the next twelve months we will have a better idea of the types of apps that are right for living room versus those that are better suited for single users on a PC or mobile device.
  • Further Definition of the New Music Industry: Overall, the music business will continue its ongoing process of consolidation.  The notion of owning music in the form of a CD or a digital download will continue to erode, particularly for Generation Y, being replaced by cloud driven subscription models.  Internet data driven charts such as those powered by BigChampagne and the Echonest will rise in influence as sales and airplay based charts lose relevance.  Many of my thoughts are echoed in this Hypebot interview with Music 3.0 author Bobby Owsinski.
  • 3D Backlash: 3D was heavily pushed by the film and consumer electronics industries this past year.  There was an over abundance of 3D film releases in 2010, and a number exhibited poorly implemented stereoscopic effects.  Consequently, movie goers will be more reluctant to pay extra to see a film in 3D.   Though 3D capable televisions were touted to consumers, there is a lack of 3D content for these devices.  Unless the content availability issue is resolved, 3D television will be of interest to only the very early adopters.
  • Kinect Hacks: For years, researchers have utilized computer vision to build gesture based interfaces.  An excellent example is seen in the M.I.T. Media Lab’s Smart Rooms projects in the 1990’s.  Depth sensing cameras could cost in the $30,000 range, a price which prevented many developers from exploring this area of interface design.  The Microsoft Kinect is an amazing piece of hardware in that it provides a highly functional depth sensing camera system for only $150.00.  Realizing its potential, hobbyists and researchers have quickly built up a thriving Kinect hacking community in similar fashion to what occurred with the release of the Wii.  With open source efforts now springing up to support the Kinect, this community will continue to thrive and innovate in 2011.  Their work will hopefully inspire and guide the design of innovative gesture driven games and applications for the mass market.

Register for “The Augmented Reality Event” (ARE), Discounts Available

Please join me at one of the first events completely dedicated to the business of Augmented Reality: The Augmented Reality Event, presented by Qualcomm, taking place June 2-3, 2010  at the Santa Clara Convention Center in Silicon Valley.

Conference highlights include keynotes by Bruce Sterling and Will Wright among others.  On Thursday I will be giving a talk titled “Augmented Reality in Music Entertainment: Then and Now.”

You can receive the reduced $195 registration price – $200 off the $395 normal price – by using Discount Code: E195 during registration. This fee includes both conference days, lunch, reception and more. Read the rest of this entry »

Enabling Interactive Concert Experiences With Smartphones

During any concert today, fans are actively using their mobile devices. They are taking photos and videos, sending text messages, posting tweets, and updating their Facebook status.

Using mobile apps like those from Ustream and Qik, a few are even live streaming the show. Taking note of this behavior, a compelling opportunity exists to use these devices to create opportunities for fans to interact with the on stage artist and become participants in the performance.

One of the most ambitious efforts in this area is being led by techno DJ and producer Richie Hawtin. An established music technology innovator, Hawtin gained attention in recent years for tweeting his DJ sets. As part of his current tour performing as his alter ego, Plastikman, he and his collaborators have released the “SYNK” application for the iPhone and the iPod Touch. Read the rest of this entry »

Siri and the Emergence of the Virtual Personal Assistant

Computing pioneers Vannevar Bush, J.C.R. Licklider, and Doug Engelbart envisioned computers as a way to extend the human mind’s capabilities. Their ideas proposed that by delegating a portion of our tasks to computing systems, we could more effectively manage the increasing complexity of our lives.

In 1997, I attended a brilliant presentation by wearable computing pioneer Dr. Thad Starner that made me aware of how this vision would be realized.  At the time, Thad wore a PC/104 based computer equipped with a “Private Eye” head worn display, a twiddler chorded keyboard, and a CDPD wireless internet connection. With a series of demonstrations, he illustrated the concept of contextually aware computing in which knowledge of location, time, and past user behaviors can be leveraged to better assist a person in completing their tasks. The idea is that through contextual information and a growing body of knowledge of a user’s habits, a computer interface can evolve to fit the user as opposed to the user having to adapt to a static interface. Over time, he described how such an interface could learn enough about an individual to become a “digital doppelganger” which could independently handle a number of one’s routine responsibilities. As an example, he described a scenario in which the time of year is December, and your wearable computer uses its knowledge of your gift buying habits to act on your behalf to complete all of your Christmas shopping for you.

Read the rest of this entry »

The Reality of Augmented Reality

In 2009 augmented reality technology (AR) became mainstream. Though it has been under development for over four decades, in the past year it was prominently featured in major ad campaigns and was on the cover of Esquire. Concurrently, Layar, Wikitude, and a number of AR applications were released for mobile phones.  The future potential of AR has now captured the imagination of both the public and the press. The hype surrounding this technology is similar to the excitement over virtual reality during the 1990’s and 3D online communities, namely Second Life, during this past decade. Unfortunately, in the mind of consumers, neither of these technologies lived up to the hype. Due to a lack of understanding, virtual reality and 3D online communities were unfairly and prematurely dismissed as failures by many. AR is in danger of suffering the same fate. Geoff Northcott described the situation well in his post Augmented Reality, Second Life, and the Trough of Disillusionment.

In an effort to help manage expectations regarding AR technology, I will briefly describe what works today while clarifying what we can expect in the future.

Read the rest of this entry »

Improving Music Search With Machine Learning

Popular music search and discovery systems such as Pandora and rely primarily upon human entered annotations to properly classify songs for search retrieval.  Though effective, human centric approaches to music classification are labor intensive and the recommendations that can be generated are limited in scope. For instance, a person must know the name of a particular artist or track in order to receive a recommendation. This situation is not a problem for music fans and aficionados, but it tends to limit the discovery possibilities for casual listeners who may not know a wide variety of artists and track names.

Researchers at the University of California San Diego Computer Audition Lab have developed a system that could address this problem by allowing people to find music using descriptive words rather than artist names and song titles.  For instance, a person could enter the words “high energy guitars” or “romantic vocals” and then receive a list of tracks that match that description.

The USCD system is capable of ingesting songs and automatically tagging them with annotation data without human intervention. To provide accurate results, the system must first be taught to hear music and describe it using natural language.  The training process uses digital signal processing and machine learning algorithms to expose the system to a broad array of music along with the words people use to describe it.  For example, to be able to accurately identify music that is referred to as “driving rock”, the system must analyze a large number of driving rock songs and then identify signal patterns that make that particular style of song unique.

The researchers have been gathering training data through crowdsourcing using an innovative Facebook game called “Herd-It“.  In this game, users are played a song snippet and asked to associate descriptive words and phrases with it.  Users earn points based on how well their answers match those of previous players.  Here’s a video describing the game.

The research group’s latest work in improving automatic music analysis was recently presented at the 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in the paper “Dynamic Texture Models of Music,” by Luke Barrington, Antoni Chan, and Gert Lanckriet.

With the continuing decline of the radio DJ as taste maker, web based music search and discovery tools will become increasingly important. With further development, machine learning driven music search systems such as this one could provide an intuitive and compelling method for listeners to find music they will enjoy.

Live Concerts in Your Hand: Big Boi and Blink-182 in Augmented Reality

Recently, Doritos began an innovative campaign, Doritos Late Night, in which you can use a webcam and a bag of chips to see a concert appear in your hands. Bags of Doritos have been printed with a computer vision tracking marker which the webcam detects and uses to render a pre-recorded 3D concert. To see the 3D concert, you must first purchase a bag of Doritos printed with the marker. Next, you plug in your webcam, visit and hold the bag in front of the camera. You can choose to see concerts from either Big Boi or Blink-182. The site was developed using the Flash AR Toolkit (FLARToolkit).

Here are video captures of the performances.

This promotion is an excellent example of using innovative technology and engaging content to capture audience attention. Simultaneously, it provides a unique avenue for artists to promote their music and live shows.

Augmented reality (AR) is a technology that has been around for quite a while, primarily in the academic and research domains. Recently, its mainstream presence has increased due to the development of the Flash version of the Augmented Reality Toolkit (FLARToolkit). The ARToolkit was developed by Dr. Hirokazu Kato with Dr. Mark Billinghurst at the University of Washington’s Human Interface Technology Lab (HITLAB) over ten years ago.

In 2000-2001, I led a team which modified the original C based ARToolkit to work on 3D accelerated desktop Windows PC’s. We used the toolkit to develop interactive augmented reality projections for the band Duran Duran’s live concert tour. Here’s a video of ARToolkit effects that we used in the live shows.

The project was documented in this presentation at the 2002 Augmented Reality Toolkit Workshop:

Jarrell Pair, Jeff Wilson, Jeff Chastine, Maribeth Gandy. “The Duran Duran Project: The Augmented Reality Toolkit in Live Performance”. The First IEEE International Augmented Reality Toolkit Workshop, 2002. Download PDF

Quividi: Smart Signage

As advertisers increasingly use digital signage, there will be a demand for detailed audience data akin to what is delivered by web analytics systems. Quividi has developed a camera based solution for measuring impressions, watcher counts, and attention time for ads shown on displays inside stores, on sidewalks, and in other out of home locations. Using facial recognition technology, ads can be targeted to an audience’s gender. Similar advertising technology was depicted in the 2002 science fiction film, Minority Report. Obviously, this product raises significant privacy concerns. Quividi addresses this issue by claiming that no video is ever recorded, only the data derived from the processed footage. Here’s a short piece on Quividi from Advertising Age.

Mobile Personal Broadcasting:, Qik, Kyte, Flixwagon

In 2007, the phenomenon of thrust the notion of personal live broadcasting into mainstream internet culture.  Anyone with an internet connection and a USB webcam now has a plethora of options for live broadcasting with sites such as Stickam,, and others. In the mobile arena, Qik, Kyte, and Flixwagon have released applications allowing users to live stream from smartphones.  This week, another player emerged with releasing their mobile broadcasting platform which combines live video streaming with GPS mapping, voting, and live chat.  Currently, the Nokia S60 series phones are the preferred hardware devices since Apple has been reluctant to approve live video streaming applications for the iphone.  However, Qik,, and Flixwagon do provide applications for jailbroken iphones.

Now that it’s possible, what will be the breakthrough applications for mobile live broadcasting?   I think the answer may lie in looking at the trend of celebrities using Twitter.  Twitter is popular with stars because it is simple and easy to maintain.  It can be almost spontaneously updated unlike a traditional blog or personal website.  Celebrities can easily enhance this fan communication channel using, Qik, Kyte, or Flixwagon.  In particular, mobile broadcasting could be appealing to touring musicians who rarely have an opportunity to sit at a computer and send a well composed personal blog post. Rapper Soulja Boy has been an early adopter in this area by using Kyte’s mobile platform to keep his fans in the loop.  In a similar fashion, Lil Wayne is set to begin using’s mobile application.  Mobile broadcasting is clearly a concept to keep an eye on over the next year.

Here’s’s mobile demo video.