• WP5: Crawler

    WP Leader: SF

    The objective of this WP is to design and implement a crawler that seeks to find both new and updated projects on various sites, basing the search on the name, keywords or short descriptions. Search results return project names, descriptions and pointers to data from SCMs and ticket trackers (where API’s are available). The data will be broken down and transformed into a “standardized” format with metadata about each commit, file, ticket, and project which will then be stored and published via both a web portal (WP6) and via a machine accessible query API. Software license information will be extracted directly accessing code repositories and files found by the tools produced by this WP.

    Beyond the creation of the crawler, the present WP will make available also APIs to keep continuously track of “downstream modifications” – i.e. modifications that might affect an artifact using a given set of Open Source projects or components that have been placed under the radar. Such information will be made available also via RSS feeds.

    The WP will make available tools to manage tickets upstream too, so that will be possible to inform third parties about modifications made to a downstream artifact, and eventually point the authors of the original projects to such modifications.