SkySift
Built in collaboration with Jeffrey Xiao, Bilal Ali, and Lorenzo Lucena Maguire
About
SkySift is our implementation of a distributed search engine. We designed and built each component - a web server based on the HTTP/1.1 protocol, a key-value store, an analytics engine based on MapReduce and Spark, and a web crawler.
Approach
We built SkySift with the goal of building a robust search engine that delivered well-ranked, high-quality search results on a large, diverse corpus of crawled data. We placed a strong emphasis on the robustness of each individual component, with rigorous error-handling, and checkpointing across all long- running jobs (crawler, indexer, PageRank). Additionally, we wanted to ensure a fast and responsive front-end system with a clean user interface.