Audio Mostly 2007 - A report on game audio related presentations
The 2nd Audio Mostly conference, organised by the Fraunhofer Institute and Technische Universitat Ilmenau was held in Ilmenau Germany 27th-28th September 2007. This is brief report on those parts strongly pertaining to virtual world and games audio and by no means inclusive of the wide range of topics presented, some of the presented papers are omitted. For a full account of the conference and to locate the complete proceedings visit http://www.audiomostly.com/ or order as ISBN 978-3-00-022823-0.
I have arranged these by topic as it relates to their relevance to virtual world game audio.
The mappings between various input devices, generation systems and processing applications are considered by Dorota in this overview based on extensive experience of building many interactive audio systems. This mapping, which is always at the heart of any interactive installation involves complex relationships between input vectors and output signals. Dorota explains the practical requirements for developing a sound space, such as collecting appropriate sound material, testing the input space and tweaking the sensors, rehearsing choreography and movement while developing the sounds. She also introduces some important software engineering wisdom such as considering the lifetime of the project, how output material can be archived for future reference and the difficulties in storing and replaying what is a subjective experience with the limitations of video and audio recording.
Another paper considering the practical aspects of interactive installation design, in this case for robust installations with many users in a public space. The project was an game with audio feedback from a ping-pong table that existed with 350 users per day, including children, over an 7 month period in continuous use. During this time no system problems or crashes occurred. Keys to robustness given are simplicity of design, use of tested off-the-shelf components and avoiding technology that is too "new". The analysis includes construction, setup times and budgets.
I thought this paper was going to be very similar to Hannes Raffaseders "Sound table tennis", but in fact it is a different take altogether focusing more on practical audio game design and the problems of moving a paradigm from one implementation to another. The project creates an audio only version of "pong" in which an invisible ball is given a position only by audio localisation. Input transducers tried included joysticks and slider bars that give an absolute position, with mouse and other "relative position" trackers being unsuitable. The results showed no apparent difference in ability between visually impaired and sighted participants.
Nina presents experimental results from a bunch of interesting installations in which the audience are participants in a procedurally generated soundscape. Position, speed of movement, ground response pressure
and other input vectors are used to drive fairly abstract sound scenes. This also falls into the area of psychology research since it is interesting to watch videos of participants exploring the installation space and how they try and work out the mapping their actions have on affecting the result. It has relevance in input systems development for games using open spaces and multiple transducer sources. Nina then explores strain gauge transducers attached directly to the body as input systems during dance performance.
Yann has enjoyed fame after having a video of his development work posted on YouTube. The Wii Loop Machine, designed in Pd/Max, is effectively a short loop player in the flavour of Ableton Live or Sonic Foundry Acid, but Yann maps the signals from a Wii remote to the task of triggering, setting loop points, playback direction and extra effects. This creates an extremely flexible and fun way to perform loop based music. Yann demonstrated this during a concert performance. The actions of the performer are fun to watch too, which adds the possibility of dance type musical games or performances.
This paper initially seemed to have no game applications, but in fact it has powerful applications in installation spaces for gaming. Triangulation of sources by hyperbolic correlation is used on radio microphones to accurately position a performer on a sound stage. This data is converted to MIDI, and could be encoded for multi-channel surround sound.
Maia gave us a quick overview of a system called AMEE that creates music by annealing objects in a pipeline with pre-stored patterns in abstract form. A set of "producers" connected to world events sets in motion a stream of objects that correspond to phrases, transitions, melodic motifs etc. As these pass along the pipeline they can be adapted in real time to add emotional nuance to the piece by changing key/mode, decelerando, accentuation and other traditional music tricks. The pipeline paradigm seems appropriate since it is similar to the way some real composers work. Examples played seemed to be quite good with the ability to add tension, sadness or liveliness to well known classical compositions.
Ryan and Michael attempt to bring together several disparate analysis and generation methods. To understand why this is fairly important and difficult work read
http://www.obiwannabe.co.uk/html/papers/proc-audio/proc-audio.html for an overview of AI production methods. In Ryan and Michael's system each analysis tool is plugged into an adaptor which stores musical
information in a common abstracted format. There are interface adaptors to a number of music generation subsystems too, so the entire ensemble acts as a kind of general function junction to tie together different analysis and production tools. The intermediate representation is designed to be as flexible as possible in compatibility with MusicXML, GUIDO, MusicData, Lillypond and so forth, so it can also generate sequence files and score notation for printing.
Game music that delivers some kind of narrative according to dynamic events is considered, not just as a decoration or accompaniment but as a tool for mediating meaning. Axel and Knut ponder once again the difference between artistically strong but fixed pre-composed music and very flexible but often artistically void generative scores. The Magdeburg team develop Wingstedts sixfold classification of score semantics (Emotive, Informative, Depictive, Guiding, temporal and Rhetorical) in a generative context. Expressive performance manipulation was described (interesting in the context following Maia Hoeberechts presentation where similar principles of abstracting score from run-time performance are possible) in which passages were made to play sad, happy or excited from the same underlying musical structure. The paper also details determining sequential arrangement, overlay of polyphonic parts, counter melody and harmony at runtime. The concept of structural protection is introduced as a method of melodic/harmonic interpolation that avoids silly sounding transitions between generated score parts.
What if the environment could determine what kind of music we listen to? This is the question the Dublin team attempt to answer. Gordon presented playlist construction from environmental data such as temperature, weather, geolocation, time of day and other contexts as a way to manage the very large data sets collected by most music listeners. Work on feature extraction from timbre, rhythm/tempo, key and other musical characteristics is also presented. The analysis cannot be separated from the selection algorithm and so the team present an entire system in which playlist selection is basically a small expert system and pattern recognition tool designed to choose the best music for a given context. This obviously has extremely useful applications in game music delivery where virtual world environmental contexts can be used as the input data.
User playlist generation seems to be a hot topic. In this presentation Martin G, Elias P and Martin T give us a GUI and selection algorithm based on heuristic style similarity metrics determined by the user in an adaptive process. An evaluation of marketability and generated playlist quality was designed and executed indicating good demand and reasonable usability.
The team from Belarus give us an audio database and search method based on a hierarchical "cellular tree" that represents a high dimensional feature space. Designed for fast browsing and search the system offers several unique and interesting capabilities. Each cell contains congregate data so that partial (progressive) queries can be served while navigating the spanning tree. The feature space is non linear and non-unique, multi metric accessor methods that trade of tree balancing and I/O speed against flexibility of representation more appropriate to music tagging.
Steffans work blurs the boundary between the professional DJ and the audience in terms of music selection and has profound implication in 'interactive club" music and social participation. Using mobile devices and data taken from artist, genre and style metadata the system attempts to turn clubbing into a multi-player game where the audience participate in the process of music selection and mixing. Steffan described a Java based prototype in which users submit candidate songs from a networked mobile device which are then evaluated
in a "round based" game. Selection is determined by a "landscape" based distance algorithm which plots a course through a suitable mix of songs in order to preserve mood, reflect democratic requests and minimise abrupt changes in tempo or style.
My own paper on advanced synthetic game audio may be found here http://obiwannabe.co.uk/html/papers/audiomostly/AudioMostly2007-FARNELL.pdf. I attempt to overview the requirements for a next generation game audio engine that uses synthesis parameterised by physics engine data, rather than the sample selection and treatment systems currently deployed in engines like FMOD and Wwise. Examples for fire, water, animals and machines are presented along with a description of how they are parameterised by game world RTCC (real-time continuous controllers) through OSC protocol. A complete system which builds on MPEG4SA framework similar to Ulrich
Reiters "TANGA" is being developed but specifically for multiplayer games where additional difficult problems like network replication and time-tagging of events is required.
Acromymically and with some humour "The Advanced Next Generation Audio", a dynamically reconfigurable audio engine based on scene descriptors from MPEG4SA. Ulrich considers the problems of DSP graph transcription to maximise parallelism and demonstrates a heuristic approach to threading concurrent DSP procedures with minimal blocking. A modular system written in C++ and compatible with a number of audio I/O layers such as ASIO, ALSA, PortAudio (Jack?). Additionally Ulrich gave a small group of attendees a private demonstration of the MPEG4SA system for implementing TANGA delivery at Fraunhofers listening laboratory. Two models for source localisation, a perceptual model and a physical model were demonstrated for a virtual walkthrough of a building. I am personally very interested in this system and its possibility for carrying real-time parameterised synthetic sources for a game.
Simon gave us a live overview of the current state of the Wwise engine, including a very enthusiastic and fun demo containing great examples of masking, ducking, transient expansion, camera driven focus and EQ based occlusion. While the engine is not "new technology" and the capabilities and techniques are common to
traditional audio production from the last 30 years, the continued future development of Wwise seems promising, including a roadmap to include better physics engine driven real-time parameterisation and synthesis capabilities. The most impressive feature of Wwise is clearly its amazing GUI interface and object based parameter matrix with neat graphical icons to help non-programmers develop scene data.
How do we take mono recordings from old films and extract multi-channel audio for surround sound systems? It seems impossible! But Christian, Andreas and Michael have developed a means to extract the ambiance from the direct dry signals, and do it quickly using a method called non-negative sparse matrix factorisation. A spectrogram shows us how ambiance is blured trails that follow the direct signal. By determining which signals belong to direct components those beyond a certain threshold are sent to the back channel of a 5.1 surround system so that even mono sources can produce immersive audio. This also has applications in signal cleaning for removing reverb or ambient noise.
Wavefield synthesis has been developed by Fraunhofer Institut and the IOSONO working group. The principle is to harness Huygens principle of wavefront formation by splitting localised sounds into multi-channel audio (up to 200 loudspeakers) such that the superposition of the new wavefield can place the location of the source anywhere in the room, or even outside the room (behind the source front). We were treated to a demonstration of the IOSONO wavefield synthesis in Ilmenaus cinema, one of a very few wavefield installations in the world, and basically it shits all over Dolby 5.1 sound. The realism of the localisation is quite frightening. Martin, Andreas and Andreas adapt the OpenAL open standard for audio localisation to offer wavefield synthesis to games platforms through scene descriptors and object parameters in polar (distance/attenuation, angle) form. The system is tested with Unreal2, Paraiah, Cold War, Jedi Knight, Soldier of Fortune and Quake 4.
This work on high quality spacial encoding for ordinary stereo speaker arrangements was described in a paper by the group from Holistiks in Athens, Greece. An enhanced HRTF using multiple binaural channels within the same listening field is complemented with impulse responses for virtual geometry and treated with crosstalk cancellation such that it performs well on ordinary stereo loudspeakers as well as headphones. Additionally some efficient algorithms for moving sources are introduced. This system seems rather well suited for games. It's a shame the Greek team did not present an audio demonstration of this.
One of the more unusual and interesting presentations because of its brutal honesty in presenting negative research results. Nigel and Mats described the failure in a rapid game prototyping project that was a textbook example for software engineering students. The results give lessons that resonate with Sommervilles
strictures on timetabling, Weinbergs notes on egoless programming and many other pitfalls understood by experienced project developers including a lack of prior metrics and poor requirements specification. There were connections between this presentation and Inger Ekmans talk (below) as warnings about developing on mobile platforms where the closed nature of the business thwarts developers with proprietary obstacles, unreasonable NDA requirements, locked hardware and unreadable source code. Afterwards I spoke to Mats mentioning that it is widely considered that Trolltechs Qtopia Greenphone and OpenMoko Neo are realistic platforms for mobile developers since they have completely open architectures and run Linux.
Ingers talk was about development of games for mobile applications that involve interactivity with real world locations through GPS layers. Some very interesting results were given about the problems of interaction with sound during a study of a prototype game. These included correct semantic bindings from a sound design perspective, social issues of having an audio game running over a long period, and difficulties in developing for a mobile platform. Ingar also explained how an audio driven game for sighted people is unique since most of the research in this area is directed towards visually impaired people and does not necessarily translate well.
Daniel presents an exploration of resources available for game sound education and details the development for a curriculum at the Interaction Design Department in Zurich. The diversity of skills and terminology is addressed as well as the multitude of technologies and platforms that make game audio design rather more difficult than production for well established media such as film and television. Topics include semiotics and cohesion to object semantics in terms of power, speed, size, position and so forth, auditory scene analysis and construction, sound description vocabulary as a means of exchange between sound designers and the nuance of dynamic contexts. In his paper Daniel puts the course objectives into context with practice, contemporary and future, as well as other educational programmes such as IASIG and SAE. It's a shame he was unable to make an oral presentation of this, game audio education and toolchain development is a favorite subject of mine and the breadth of this work was not adequately captured in a poster session.
Sound design is influenced by a multi-modal approach in which eight distinct modes of listening are considered. The team from Finland present an interpretation of sound design that breaks listening categories into at least three further non-orthogonal groupings, semantic listening, causal listening, and reduced listening. The team propose a classification system that considers reflexive (base instinctive response), connotative (low level associations), causal (what is the production mechanism?), emphatic (application to mood or emotion), functional (significance to real world events/information, does the sound have a purpose?, eg. telephone bell), semantic (meaning of the sound by convention), critical (appropriateness of the sound in context), reduced (fundamental analysis of the sound - how most sound designers like myself view a sound in terms of abstract signals etc). These modes are considered in case studies that reflect the workflow of the sound designer and the perception of audience.
Raymonds talk was an engaging and memorable investigation of the universal effects and roots of music in humans. He talked about the formative psychology of musicianship, identity of musicians and subjective personal interpretations. Most interesting was the therapeutic aspects including a talk about studies on the effects of music on pain perception and concentration. This talk had particular personal resonance with me since my girlfriend is involved in trauma therapy with a research interest in psychotherapy through singing. Most relevant to the game audio topic is that almost all tasks accompanied by music are performed better when the listening to familiar favorite music. From my own research I know that many players prefer to listen to their own exciting music selections when playing multi-player online games and in many cases it may be better to allow the player to make their own music selections than package games with fixed music.
On the characteristics of annoyance in sound signals, this has useful social applications in noise pollution and abatement fields. While largely unrelated to game applications it has some relevance to sound design. Daniel and Song Hui presented results of perceptual psychoacoustics experiments to determine what sounds have the greatest annoyance factors. This stood out as an excellent talk by the American team, for well delivered presentation and experimental rigor.
This was the second Audio Mostly, the first being held in Sweeden in 2006. The event focuses on advanced audio media technology with a strong emphasis on games and interactive systems. Next years event will also be in Sweeden on the provisional theme of "Sound in motion".
A big thanks goes to Karlheinz Brandenburg, Philipp Meyer, Holger Grossmann, Yvonne Baro, Hennig Kohler for organisation, Katrina Delsing, Stuart Cunningham Lilian Johansson, Mats Liljedahl, David Moffat, Nigel Papworth, Niklas Roeber who served on the committee. A personal thanks to Daniel Steele for saving my ass with a laptop loan, Christian Dittmar, Finn Seliger, Song Hui Chon and everyone at the Havana bar who made my birthday a thoroughly drunken affair.
Audio Mostly 2007 - A report on game audio related presentations
This document was generated using the
LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -local_icons audio-mostly-07.tex
The translation was initiated by root on 2007-10-02