Category Archives: Website

Childhood communication disorders

For children with speech and language disorders, early-childhood intervention can make a great difference in their later academic and social success. But many such children — one study estimates 60 percent — go undiagnosed until kindergarten or even later.

Researchers at the Computer Science and Artificial Intelligence Laboratory at MIT and Massachusetts General Hospital’s Institute of Health Professions hope to change that, with a computer system that can automatically screen young children for speech and language disorders and, potentially, even provide specific diagnoses.

This week, at the Interspeech conference on speech processing, the researchers reported on an initial set of experiments with their system, which yielded promising results. “We’re nowhere near finished with this work,” says John Guttag, the Dugald C. Jackson Professor in Electrical Engineering and senior author on the new paper. “This is sort of a preliminary study. But I think it’s a pretty convincing feasibility study.”

The system analyzes audio recordings of children’s performances on a standardized storytelling test, in which they are presented with a series of images and an accompanying narrative, and then asked to retell the story in their own words.

“The really exciting idea here is to be able to do screening in a fully automated way using very simplistic tools,” Guttag says. “You could imagine the storytelling task being totally done with a tablet or a phone. I think this opens up the possibility of low-cost screening for large numbers of children, and I think that if we could do that, it would be a great boon to society.”

Subtle signals

The researchers evaluated the system’s performance using a standard measure called area under the curve, which describes the tradeoff between exhaustively identifying members of a population who have a particular disorder, and limiting false positives. (Modifying the system to limit false positives generally results in limiting true positives, too.) In the medical literature, a diagnostic test with an area under the curve of about 0.7 is generally considered accurate enough to be useful; on three distinct clinically useful tasks, the researchers’ system ranged between 0.74 and 0.86.

To build the new system, Guttag and Jen Gong, a graduate student in electrical engineering and computer science and first author on the new paper, used machine learning, in which a computer searches large sets of training data for patterns that correspond to particular classifications — in this case, diagnoses of speech and language disorders.

The training data had been amassed by Jordan Green and Tiffany Hogan, researchers at the MGH Institute of Health Professions, who were interested in developing more objective methods for assessing results of the storytelling test. “Better diagnostic tools are needed to help clinicians with their assessments,” says Green, himself a speech-language pathologist. “Assessing children’s speech is particularly challenging because of high levels of variation even among typically developing children. You get five clinicians in the room and you might get five different answers.”

Unlike speech impediments that result from anatomical characteristics such as cleft palates, speech disorders and language disorders both have neurological bases. But, Green explains, they affect different neural pathways: Speech disorders affect the motor pathways, while language disorders affect the cognitive and linguistic pathways.

Telltale pauses

Green and Hogan had hypothesized that pauses in children’s speech, as they struggled to either find a word or string together the motor controls required to produce it, were a source of useful diagnostic data. So that’s what Gong and Guttag concentrated on. They identified a set of 13 acoustic features of children’s speech that their machine-learning system could search, seeking patterns that correlated with particular diagnoses. These were things like the number of short and long pauses, the average length of the pauses, the variability of their length, and similar statistics on uninterrupted utterances.

The children whose performances on the storytelling task were recorded in the data set had been classified as typically developing, as suffering from a language impairment, or as suffering from a speech impairment. The machine-learning system was trained on three different tasks: identifying any impairment, whether speech or language; identifying language impairments; and identifying speech impairments.

One obstacle the researchers had to confront was that the age range of the typically developing children in the data set was narrower than that of the children with impairments: Because impairments are comparatively rare, the researchers had to venture outside their target age range to collect data.

Role for neurons

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory have developed a new computational model of a neural circuit in the brain, which could shed light on the biological role of inhibitory neurons — neurons that keep other neurons from firing.

The model describes a neural circuit consisting of an array of input neurons and an equivalent number of output neurons. The circuit performs what neuroscientists call a “winner-take-all” operation, in which signals from multiple input neurons induce a signal in just one output neuron.

Using the tools of theoretical computer science, the researchers prove that, within the context of their model, a certain configuration of inhibitory neurons provides the most efficient means of enacting a winner-take-all operation. Because the model makes empirical predictions about the behavior of inhibitory neurons in the brain, it offers a good example of the way in which computational analysis could aid neuroscience.

The researchers will present their results this week at the conference on Innovations in Theoretical Computer Science. Nancy Lynch, the NEC Professor of Software Science and Engineering at MIT, is the senior author on the paper. She’s joined by Merav Parter, a postdoc in her group, and Cameron Musco, an MIT graduate student in electrical engineering and computer science.

For years, Lynch’s group has studied communication and resource allocation in ad hoc networks — networks whose members are continually leaving and rejoining. But recently, the team has begun using the tools of network analysis to investigate biological phenomena.

“There’s a close correspondence between the behavior of networks of computers or other devices like mobile phones and that of biological systems,” Lynch says. “We’re trying to find problems that can benefit from this distributed-computing perspective, focusing on algorithms for which we can prove mathematical properties.”

Artificial neurology

In recent years, artificial neural networks — computer models roughly based on the structure of the brain — have been responsible for some of the most rapid improvement in artificial-intelligence systems, from speech transcription to face recognition software.

An artificial neural network consists of “nodes” that, like individual neurons, have limited information-processing power but are densely interconnected. Data are fed into the first layer of nodes. If the data received by a given node meet some threshold criterion — for instance, if it exceeds a particular value — the node “fires,” or sends signals along all of its outgoing connections.

Each of those outgoing connections, however, has an associated “weight,” which can augment or diminish a signal. Each node in the next layer of the network receives weighted signals from multiple nodes in the first layer; it adds them together, and again, if their sum exceeds some threshold, it fires. Its outgoing signals pass to the next layer, and so on.

Tips to make manageable big data

One way to handle big data is to shrink it. If you can identify a small subset of your data set that preserves its salient mathematical relationships, you may be able to perform useful analyses on it that would be prohibitively time consuming on the full set.

The methods for creating such “coresets” vary according to application, however. Last week, at the Annual Conference on Neural Information Processing Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and the University of Haifa in Israel presented a new coreset-generation technique that’s tailored to a whole family of data analysis tools with applications in natural-language processing, computer vision, signal processing, recommendation systems, weather prediction, finance, and neuroscience, among many others.

“These are all very general algorithms that are used in so many applications,” says Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT and senior author on the new paper. “They’re fundamental to so many problems. By figuring out the coreset for a huge matrix for one of these tools, you can enable computations that at the moment are simply not possible.”

As an example, in their paper the researchers apply their technique to a matrix — that is, a table — that maps every article on the English version of Wikipedia against every word that appears on the site. That’s 1.4 million articles, or matrix rows, and 4.4 million words, or matrix columns.

That matrix would be much too large to analyze using low-rank approximation, an algorithm that can deduce the topics of free-form texts. But with their coreset, the researchers were able to use low-rank approximation to extract clusters of words that denote the 100 most common topics on Wikipedia. The cluster that contains “dress,” “brides,” “bridesmaids,” and “wedding,” for instance, appears to denote the topic of weddings; the cluster that contains “gun,” “fired,” “jammed,” “pistol,” and “shootings” appears to designate the topic of shootings.

Joining Rus on the paper are Mikhail Volkov, an MIT postdoc in electrical engineering and computer science, and Dan Feldman, director of the University of Haifa’s Robotics and Big Data Lab and a former postdoc in Rus’s group.

The researchers’ new coreset technique is useful for a range of tools with names like singular-value decomposition, principal-component analysis, and latent semantic analysis. But what they all have in common is dimension reduction: They take data sets with large numbers of variables and find approximations of them with far fewer variables.

In this, these tools are similar to coresets. But coresets are application-specific, while dimension-reduction tools are general-purpose. That generality makes them much more computationally intensive than coreset generation — too computationally intensive for practical application to large data sets.

The researchers believe that their technique could be used to winnow a data set with, say, millions of variables — such as descriptions of Wikipedia pages in terms of the words they use — to merely thousands. At that point, a widely used technique like principal-component analysis could reduce the number of variables to mere hundreds, or even lower.

The coolest things that happened at the Computer Science

Machines that predict the future, robots that patch wounds, and wireless emotion-detectors are just a few of the exciting projects that came out of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) this year. Here’s a sampling of 16 highlights from 2016 that span the many computer science disciplines that make up CSAIL.

Robots for exploring Mars — and your stomach

  • A team led by CSAIL director Daniela Rus developed an ingestible origami robot that unfolds in the stomach to patch wounds and remove swallowed batteries.
  • Researchers are working on NASA’s humanoid robot, “Valkyrie,” who will be programmed for trips into outer space and to autonomously perform tasks.
  • A 3-D printed robot was made of both solids and liquids and printed in one single step, with no assembly required.

Keeping data safe and secure

  • CSAIL hosted a cyber summit that convened members of academia, industry, and government, including featured speakers Admiral Michael Rogers, director of the National Security Agency; and Andrew McCabe, deputy director of the Federal Bureau of Investigation.
  • Researchers came up with a system for staying anonymous online that uses less bandwidth to transfer large files between anonymous users.
  • A deep-learning system called AI2 was shown to be able to predict 85 percent of cyberattacks with the help of some human input.

Advancements in computer vision

  • A new imaging technique called Interactive Dynamic Video lets you reach in and “touch” objects in videos using a normal camera.
  • Researchers from CSAIL and Israel’s Weizmann Institute of Science produced a movie display called Cinema 3D that uses special lenses and mirrors to allow viewers to watch 3-D movies in a theater without having to wear those clunky 3-D glasses.
  • A new deep-learning algorithm can predict human interactions more accurately than ever before, by training itself on footage from TV shows like “Desperate Housewives” and “The Office.”
  • A group from MIT and Harvard University developed an algorithm that may help astronomers produce the first image of a black hole, stitching together telescope data to essentially turn the planet into one large telescope dish.

Fighting bias in machine learning

When Joy Buolamwini, an MIT master’s candidate in media arts and sciences, sits in front a mirror, she sees a black woman in her 20s. But when her photo is run through recognition software, it does not recognize her face. A seemingly neutral machine programmed with algorithms-codified processes simply fails to detect her features. Buolamwini is, she says, “on the wrong side of computational decisions” that can lead to exclusionary and discriminatory practices and behaviors in society.

That phenomenon, which Buolamwini calls the “coded gaze,”  is what motivated her late last year to launch the Algorithmic Justice League (AJL) to highlight such bias through provocative media and interactive exhibitions; to provide space for people to voice concerns and experiences with coded discrimination; and to develop practices for accountability during the design, development, and deployment phases of coded systems.

That work is what contributed to the Media Lab student earning the grand prize in the professional category of The Search for Hidden Figures. The nationwide contest, created by PepsiCo and 21st Century Fox in partnership with the New York Academy of Sciences, is named for a recently released film that tells the real-life story of three African-American women at NASA whose math brilliance helped launch the United States into the space race in the early 1960s.

“I’m honored to receive this recognition, and I’ll use the prize to continue my mission to show compassion through computation,” says Buolamwini, who was born in Canada, then lived in Ghana and, at the age of four, moved to Oxford, Mississippi. She’s a two-time recipient of an Astronaut Scholarship in a program established by NASA’s Mercury 7 crew members, including late astronaut John Glenn, who are depicted in the film “Hidden Figures.”

The film had a big impact on Buolamwini when she saw a special MIT sneak preview in early December: “I witnessed the power of storytelling to change cultural perceptions by highlighting hidden truths. After the screening where I met Margot Lee Shetterly, who wrote the book on which the film is based, I left inspired to tell my story, and applied for the contest. Being selected as a grand prize winner provides affirmation that pursuing STEM is worth celebrating. And it’s an important reminder to share the stories of discriminatory experiences that necessitate the Algorithmic Justice League as well as the uplifting stories of people who come together to create a world where technology can work for all of us and drive social change.”

The Search for Hidden Figures contest attracted 7,300 submissions from students across the United States. As one of two grand prize winners, Buolamwini receives a $50,000 scholarship, a trip to the Kennedy Space Center in Florida, plus access to New York Academy of Sciences training materials and programs in STEM. She plans to use the prize resources to develop what she calls “bias busting” tools to help defeat bias in machine learning.

That is the focus of her current research at the MIT Media Lab, where Buolamwini is in the Civic Media group pursuing a master’s degree with an eye toward a PhD. “The Media Lab serves as a unifying thread in my journey in STEM. Until I saw the lab on TV, I didn’t realize there was a place dedicated to exploring the future of humanity and technology by allowing us to indulge our imaginations by continuously asking, ‘What if?'”

Before coming to the Media Lab, Buolamwini earned a BS in computer science as a Stamps President’s Scholar at Georgia Tech and a master’s in learning and technology as a Rhodes Scholar at Oxford University. As part of her Rhodes Scholar Service Year, Buolamwini launched Code4Rights to guide young people in partnering with local organizations to develop meaningful technology for their communities. In that year, she also built upon a computer science learning initiative she’d created during her Fulbright fellowship in Lusaka, Zambia, to empower young people to become creators of technology. And, as an entrepreneur, she co-founded a startup hair care technology company and now advises an MIT-connected “smart” clothing startup aimed at transforming women’s health. She’s also an experienced public speaker, most recently at TEDx Beacon Street, the White House, the Vatican, and the Museum of Fine Arts, Boston.

From an early age, Buolamwini felt encouraged to aim high in STEM: “I went after my dreams, and now I continue to push myself beyond present barriers to create a more inclusive future. Inclusive participation matters. And, by being visible in STEM, I hope to inspire the next generation of stargazers.”

Art and technology

Garrett Parrish grew up singing and dancing as a theater kid, influenced by his older siblings, one of whom is an actor and the other a stage manager. But by the time he reached high school, Parrish had branched out significantly, drumming in his school’s jazz ensemble and helping to build a state-championship-winning robot.

MIT was the first place Parrish felt he was able to work meaningfully at the nexus of art and technology. “Being a part of the MIT culture, and having the resources that are available here, are what really what opened my mind to that intersection,” the MIT senior says. “That’s always been my goal from the beginning: to be as emotionally educated as I am technically educated.”

Parrish, who is majoring in mechanical engineering, has collaborated on a dizzying array of projects ranging from app-building, to assistant directing, to collaborating on a robotic opera. Driving his work is an interest in shaping technology to serve others.

“The whole goal of my life is to fix all the people problems. I sincerely think that the biggest problems we have are how we deal with each other, and how we treat each other. [We need to be] promoting empathy and understanding, and technology is an enormous power to influence that in a good way,” he says.

Technology for doing good

Parrish began his academic career at Harvard University and transferred to MIT after his first year. Frustrated at how little power individuals often have in society, Parrish joined DoneGood co-founders Scott Jacobsen and Cullen Schwartz, and became the startup’s chief technology officer his sophomore year. “We kind of distilled our frustrations about the way things are into, ‘How do you actionably use people’s existing power to create real change?’” Parrish says.

The DoneGood app and Chrome extension help consumers find businesses that share their priorities and values, such as paying a living wage, or using organic ingredients. The extension monitors a user’s online shopping and recommends alternatives. The mobile app offers a directory of local options and national brands that users can filter according to their values. “The two things that everyday people have at their disposal to create change is how they spend their time and how they spend their money. We direct money away from brands that aren’t sustainable, therefore creating an actionable incentive for them to become more sustainable,” Parrish says.

DoneGood has raised its first round of funding, and became a finalist in the MIT $100K Entrepreneurship Competition last May. The company now has five full-time employees, and Parrish continues to work as CTO part-time. “It’s been a really amazing experience to be in such an important leadership role. And to take something from the ground up, and really figure out what is the best way to actually create the change you want,” Parrish says. “Where technology meets cultural influence is very interesting, and it’s a space that requires a lot of responsibility and perspective.”

Robotic spectaculars

Parrish also loves building physical objects, and his mechanical engineering major has provided a path to many of his creative projects. “Part of my enjoyment comes from building things with [my] hands and being able to actually work in the physical world, and by studying mechanical engineering you get an invaluable understanding of how the physical world works,” he says. “I also believe strongly in the powers of computers to do things, so combining the two of [these areas] — basically programming mechanical things — is where I think I can get the most enjoyment.”

Easy querying and filtering data

The age of big data has seen a host of new techniques for analyzing large data sets. But before any of those techniques can be applied, the target data has to be aggregated, organized, and cleaned up.

That turns out to be a shockingly time-consuming task. In a 2016 survey, 80 data scientists told the company CrowdFlower that, on average, they spent 80 percent of their time collecting and organizing data and only 20 percent analyzing it.

An international team of computer scientists hopes to change that, with a new system called Data Civilizer, which automatically finds connections among many different data tables and allows users to perform database-style queries across all of them. The results of the queries can then be saved as new, orderly data sets that may draw information from dozens or even thousands of different tables.

“Modern organizations have many thousands of data sets spread across files, spreadsheets, databases, data lakes, and other software systems,” says Sam Madden, an MIT professor of electrical engineering and computer science and faculty director of MIT’s bigdata@CSAIL initiative. “Civilizer helps analysts in these organizations quickly find data sets that contain information that is relevant to them and, more importantly, combine related data sets together to create new, unified data sets that consolidate data of interest for some analysis.”

The researchers presented their system last week at the Conference on Innovative Data Systems Research. The lead authors on the paper are Dong Deng and Raul Castro Fernandez, both postdocs at MIT’s Computer Science and Artificial Intelligence Laboratory; Madden is one of the senior authors. They’re joined by six other researchers from Technical University of Berlin, Nanyang Technological University, the University of Waterloo, and the Qatar Computing Research Institute. Although he’s not a co-author, MIT adjunct professor of electrical engineering and computer science Michael Stonebraker, who in 2014 won the Turing Award — the highest honor in computer science — contributed to the work as well.

Pairs and permutations

Data Civilizer assumes that the data it’s consolidating is arranged in tables. As Madden explains, in the database community, there’s a sizable literature on automatically converting data to tabular form, so that wasn’t the focus of the new research. Similarly, while the prototype of the system can extract tabular data from several different types of files, getting it to work with every conceivable spreadsheet or database program was not the researchers’ immediate priority. “That part is engineering,” Madden says.

The system begins by analyzing every column of every table at its disposal. First, it produces a statistical summary of the data in each column. For numerical data, that might include a distribution of the frequency with which different values occur; the range of values; and the “cardinality” of the values, or the number of different values the column contains. For textual data, a summary would include a list of the most frequently occurring words in the column and the number of different words. Data Civilizer also keeps a master index of every word occurring in every table and the tables that contain it.

Then the system compares all of the column summaries against each other, identifying pairs of columns that appear to have commonalities — similar data ranges, similar sets of words, and the like. It assigns every pair of columns a similarity score and, on that basis, produces a map, rather like a network diagram, that traces out the connections between individual columns and between the tables that contain them.

Computer Science and Artificial Intelligence

Machines that predict the future, robots that patch wounds, and wireless emotion-detectors are just a few of the exciting projects that came out of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) this year. Here’s a sampling of 16 highlights from 2016 that span the many computer science disciplines that make up CSAIL.

Robots for exploring Mars — and your stomach

  • A team led by CSAIL director Daniela Rus developed an ingestible origami robot that unfolds in the stomach to patch wounds and remove swallowed batteries.
  • Researchers are working on NASA’s humanoid robot, “Valkyrie,” who will be programmed for trips into outer space and to autonomously perform tasks.
  • A 3-D printed robot was made of both solids and liquids and printed in one single step, with no assembly required.

Keeping data safe and secure

  • CSAIL hosted a cyber summit that convened members of academia, industry, and government, including featured speakers Admiral Michael Rogers, director of the National Security Agency; and Andrew McCabe, deputy director of the Federal Bureau of Investigation.
  • Researchers came up with a system for staying anonymous online that uses less bandwidth to transfer large files between anonymous users.
  • A deep-learning system called AI2 was shown to be able to predict 85 percent of cyberattacks with the help of some human input.

Advancements in computer vision

  • A new imaging technique called Interactive Dynamic Video lets you reach in and “touch” objects in videos using a normal camera.
  • Researchers from CSAIL and Israel’s Weizmann Institute of Science produced a movie display called Cinema 3D that uses special lenses and mirrors to allow viewers to watch 3-D movies in a theater without having to wear those clunky 3-D glasses.
  • A new deep-learning algorithm can predict human interactions more accurately than ever before, by training itself on footage from TV shows like “Desperate Housewives” and “The Office.”
  • A group from MIT and Harvard University developed an algorithm that may help astronomers produce the first image of a black hole, stitching together telescope data to essentially turn the planet into one large telescope dish.

Tech to help with health

  • A team produced a robot that can help schedule and assign tasks by learning from humans, in fields like medicine and the military.
  • Researchers came up with an algorithm for identifying organs in fetal MRI scans to extensively evaluate prenatal health.
  • A wireless device called EQ-Radio can tell if you’re excited, happy, angry, or sad, by measuring breathing and heart rhythms.

A popular compiler yields

Compilers are programs that convert computer code written in high-level languages intelligible to humans into low-level instructions executable by machines.

But there’s more than one way to implement a given computation, and modern compilers extensively analyze the code they process, trying to deduce the implementations that will maximize the efficiency of the resulting software.

Code explicitly written to take advantage of parallel computing, however, usually loses the benefit of compilers’ optimization strategies. That’s because managing parallel execution requires a lot of extra code, and existing compilers add it before the optimizations occur. The optimizers aren’t sure how to interpret the new code, so they don’t try to improve its performance.

At the Association for Computing Machinery’s Symposium on Principles and Practice of Parallel Programming next week, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory will present a new variation on a popular open-source compiler that optimizes before adding the code necessary for parallel execution.

As a consequence, says Charles E. Leiserson, the Edwin Sibley Webster Professor in Electrical Engineering and Computer Science at MIT and a coauthor on the new paper, the compiler “now optimizes parallel code better than any commercial or open-source compiler, and it also compiles where some of these other compilers don’t.”

That improvement comes purely from optimization strategies that were already part of the compiler the researchers modified, which was designed to compile conventional, serial programs. The researchers’ approach should also make it much more straightforward to add optimizations specifically tailored to parallel programs. And that will be crucial as computer chips add more and more “cores,” or parallel processing units, in the years ahead.

The idea of optimizing before adding the extra code required by parallel processing has been around for decades. But “compiler developers were skeptical that this could be done,” Leiserson says.

“Everybody said it was going to be too hard, that you’d have to change the whole compiler. And these guys,” he says, referring to Tao B. Schardl, a postdoc in Leiserson’s group, and William S. Moses, an undergraduate double major in electrical engineering and computer science and physics, “basically showed that conventional wisdom to be flat-out wrong. The big surprise was that this didn’t require rewriting the 80-plus compiler passes that do either analysis or optimization. T.B. and Billy did it by modifying 6,000 lines of a 4-million-line code base.”

Schardl, who earned his PhD in electrical engineering and computer science (EECS) from MIT, with Leiserson as his advisor, before rejoining Leiserson’s group as a postdoc, and Moses, who will graduate next spring after only three years, with a master’s in EECS to boot, share authorship on the paper with Leiserson.

Forks and joins

A typical compiler has three components: the front end, which is tailored to a specific programming language; the back end, which is tailored to a specific chip design; and what computer scientists oxymoronically call the middle end, which uses an “intermediate representation,” compatible with many different front and back ends, to describe computations. In a standard, serial compiler, optimization happens in the middle end.

The researchers’ chief innovation is an intermediate representation that employs a so-called fork-join model of parallelism: At various points, a program may fork, or branch out into operations that can be performed in parallel; later, the branches join back together, and the program executes serially until the next fork.

In the current version of the compiler, the front end is tailored to a fork-join language called Cilk, pronounced “silk” but spelled with a C because it extends the C programming language. Cilk was a particularly congenial choice because it was developed by Leiserson’s group — although its commercial implementation is now owned and maintained by Intel. But the researchers might just as well have built a front end tailored to the popular OpenMP or any other fork-join language.

Automatically handles database

Today, loading a web page on a big website usually involves a database query — to retrieve the latest contributions to a discussion you’re participating in, a list of news stories related to the one you’re reading, links targeted to your geographic location, or the like.

But database queries are time consuming, so many websites store — or “cache” — the results of common queries on web servers for faster delivery.

If a site user changes a value in the database, however, the cache needs to be updated, too. The complex task of analyzing a website’s code to identify which operations necessitate updates to which cached values generally falls to the web programmer. Missing one such operation can result in an unusable site.

This week, at the Association for Computing Machinery’s Symposium on Principles of Programming Languages, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory presented a new system that automatically handles caching of database queries for web applications written in the web-programming language Ur/Web.

Although a website may be fielding many requests in parallel — sending different users different cached data, or even data cached on different servers — the system guarantees that, to the user, every transaction will look exactly as it would if requests were handled in sequence. So a user won’t, for instance, click on a link showing that tickets to an event are available, only to find that they’ve been snatched up when it comes time to pay.

In experiments involving two websites that had been built using Ur/Web, the new system’s automatic caching offered twofold and 30-fold speedups.

“Most very popular websites backed by databases don’t actually ask the database over and over again for each request,” says Adam Chlipala, an associate professor of electrical engineering and computer science at MIT and senior author on the conference paper. “They notice that, ‘Oh, I seem to have asked this question quite recently, and I saved the result, so I’ll just pull that out of memory.’”

“But the tricky part here is that you have to realize when you make changes to the database that some of your saved answers are no longer necessarily correct, and you have to do what’s called ‘invalidating’ them. And in the mainstream way of implementing this, the programmer needs to manually add invalidation logic. For every line of code that changes the database, the programmer has to sit down and think, ‘Okay, for every other line of code that reads the database and saves the result in a cache, which ones of those are going to be broken by the change I just made?’”

Chlipala is joined on the paper by Ziv Scully, a graduate student in computer science at Carnegie Mellon University, who worked in Chlipala’s lab as an MIT undergraduate.

Exhaustive search

Ur/Web, which Chlipala invented, lets web developers completely specify their sites’ functionality using just one programming language. The Ur/Web compiler then automatically generates all the different types of code required to power a website — HTML, JavaScript, SQL database queries, and cascading style sheets — while providing certain performance and security guarantees. Chlipala and Scully’s new system is a modification of the compiler, so Ur/Web users can simply recompile their existing code to get all the benefits of database caching. The language itself remains unchanged.