A new algorithm could significantly speed up web browsing by making caching more effective.
The open-source program, called “SIEVE,” introduces a new way to handle web caching — the process of storing and retrieving objects from a computer’s long-term storage as you encounter them while surfing the internet.
These objects — tiny files stored on your hard drive — include images, logos or entire copies of webpages. When you encounter these elements for the first time, you retrieve them from the server, but they are stored on your hard drive for reuse. The second time you encounter these objects, your browser can retrieve them from your computer’s memory rather than from the server, which saves time and consumes less energy.
But because local storage is limited, cache-eviction algorithms work to decide how long to store objects for, and when to replace older ones less frequently accessed by a user, with newer or more popular ones.
Although many such algorithms exist, SIEVE is a much simpler and effective option that can dramatically speed up web browsing if implemented across the internet, the scientists said in their preprint paper, published Dec. 17, 2023. They plan to present the paper at the 21st USENIX Symposium on Networked Systems Design and Implementation in April.
Related: How could this new type of room-temperature qubit usher in the next phase of quantum computing?
“A main reason why computers and the internet are fast at all is the cache. We feel software caches are this ubiquitous and yet underappreciated pillar that enable the modern web to function, and so working on them can have outsized impact,” co-first author of the paper Yazhuo Zhang, a doctoral student at Emory University in Atlanta, told Live Science.
Testing a new approach to web caching
First-in, first-out (FIFO) algorithms work by adding new objects in sequence to a “conveyor belt” to oblivion. When objects reach the end of the line, they’re removed. Less recently used (LRU) is another method in which objects move along the conveyor belt as in FIFO, but if an object is requested again, it jumps back to the front. More sophisticated variations exist, but the more complex they are, the more bugs they have, Zhang said. SIEVE, by contrast, was implemented with fewer than 20 lines of code.
SIEVE uses the same conveyor belt mechanism, but objects are labeled “zero” to begin with. When an object is requested again, its status changes to “one” and it joins the front of the line. Objects are evicted as normal when they reach the end. This is known as “lazy promotion.” Meanwhile, a “moving hand” that scans the length of the belt and loops back to the beginning, is programmed to remove any object labeled “zero.” This sieve-like function is called “quick demotion.” The scientists said SIEVE is the simplest algorithm that achieves both lazy promotion and quick demotion.
They conducted 1,500 separate tests against nine state-of-the-art algorithms using real caching histories based on tracked web-cache traces from Meta, Wikimedia, X and four other sources. One trace, for example, consisted of 2.8 billion web requests made to access media on Wikipedia in 2019. Together, the 1,500 traces comprised 247 billion requests to nearly 15 billion objects.
They were looking for a low “miss ratio,” or the fraction of objects fetched from the web versus storage, where a “miss” is considered fetching an object from the web — the lower, the better. No single algorithm is expected to have the lowest miss ratio in every test, but SIEVE was the best-performing in 45% of the tests, Zhang told Live Science. The next best algorithm, by contrast, was the top performer in just 15%.
SIEVE has already been implemented on more than 10 popular libraries that fuel modern apps and websites. Many sites may soon upgrade to SIEVE “without much effort,” Zhang said. She added that Meta is about to evaluate SIEVE in production, while Google has also expressed interest in adopting SIEVE alongside other web companies.
“This is remarkable and unusual traction,” Zhang said. No cache algorithm in the past 20 years has seen broad uptake across multiple production systems.”