The protocol
We decided to create a protocol that makes solving this problem easy. First, how does it work? We found research that indicates that 20 milliseconds is about the most delay you can have before those laptops start sounding out-of-sync. Let's demonstrate. Use the interactive tool below to see where delays appear in sound waves if you have even a little bit of lag. You can also see how it gets even worse if one of the devices has to stop to load or buffer at any point.Luckily, fixing this isn't hard. All you have to do is monitor the waves and keep them in sync. Unfortunately, syncing takes time, and even doing it often will lead to unacceptable amounts of sound interference from tiny sources of delay. Try it below by enabling random lag reacting to the lag by syncing manually.
As the example showed, even syncing every time there was lag meant the sound waves were out of sync most of the time. The solution is a sync where the players themselves sync to each other periodically, no matter what. Try it below by enabling random lag and periodic sync.
The periodic protocol keeps the sound waves in sync and recovers quickly from those periods. Now that you understand our technology, you're probably interested in how we made it. As you can imagine by now, the challenge was in minimizing sources of error and keeping sync times low. The next section will discuss some technical details - feel free to skip it.
The process
We began by planning the architecture of our solution. We decided that it would have two components: a time synchronization component, running on the local system, and then a separate component running in 4 processes that handle different functionality. The types of threads it would have are audio sync, which manages audio playback; network communication, which sends and receives messages across the network, and transmits instructions to the other threads; system time sync, which ensures that system clocks are synchronized; and user input, which hosts a web client allowing end-users to manage playlists and client configuration.
The precision time protocol, or PTP, is advertised as supporting synchronization on the level on nanoseconds, so we investigated it first. It has a couple of drawbacks that indicated early on that it might not be optimal for our specific use case: first, it assumes a constant delay and tries to adjust for that delay. We estimated that this would broadly not apply to a standard WiFi network, and documentation seemed to agree. Its most specialized backend, too, called 802.1AS, would also require a specialized network topology to fully use, so we had to fallback to direct UDP messages, which would limit the potential accuracy.
After determining that PTP wouldn't be sufficiently accurate over a wireless link, we started to look into the Network Time Protocol, NTP, as an alternative. Although NTP typically results in a less accurate sync than with PTP, it is designed to handle synchronization over a vast number of hops with high latency. Like PTP, it also assumes a constant delay, but uses a simpler method for measuring that delay. To account for the low accuracy achieved by NTP, a popular implementation of NTP was chosen, called Chrony. Chrony implements many PTP-like features to try to improve accuracy, but carefully accounts for high latency networks and uses a statistical regression to account for spikes in latency. During testing with Chrony, it was determined to yield a sufficiently accurate sync.

To handle audio playback, we used libVLC, which is a C-based API for the popular VLC media player. LibVLC is configured to use a playlist based media player. To handle synchronization between server and clients, the server device sends a deadline message periodically indicating where it expects to be at a certain point in the future. This is like the periodic sync that you tried earlier. Once this message is received by the clients, they compute their future position and determine if the offset exceeds a certain threshold. If that threshold is met, the clients adjust their position as needed to sync to each other.
The web server used for configuration and queueing is shown in the next section. Architecturally, there is one main thread that sends authoritative update messages to all clients.
The product
Below is a recording example of two recordings starting out-of-sync then syncing up. We recorded this example by putting the sound from two devices into the stereo channels, so make sure you're wearing headphones or you won't be able to tell them apart.
Following are screenshots of the configuration interface mentioned earlier.


