raccoon/README.md

2.4 KiB

Raccoon - A tool for privately generating reports from data feeds

You either build your own information processing pipelines or you become subject to someone else's.

I wanted a tool that would provide the following features:

  • Fetch data from the Internet at defined periods over Tor
  • Allow me to display that data in a configurable way.
  • Would let me extend the functionality over time.

After investigating a few options I could find nothing that fit quite right.

Usage

Create a new folder to hold all of your feeds e.g.

	mkdir feeds
	cd feeds

Create some feeds, each feed is a subfolder that contains a file feedinfo e.g.

	mkdir openprivacy-blog
	echo "https://openprivacy.ca/feed.xml 1440" > openprivacy-blog/feedinfo 

For now feedinfo just contains a single line with the URL of the feed and how often to check for updates (in minutes, in this case 1440 is once per day)

You can now run raccoon update to fetch all of your feeds over Tor. It will only attempt to fetch feeds that haven't been checked for the given update period.

To produce reports, you can run raccoon report <report.template> and raccoon will produce a markdown/html hybrid of a report that can be piped to a utility like markdown to produce a html report.

The report.template file can be customized as needed, some basic guidelines to the format:

  • Lines beginning with % are ignored

  • Line beginning with < or # are printed as-is (for injecting HTML or markdown specifically)

  • Report lines have the following (rought) format:

      <folder-name> ALL|DAY|WEEK|[0-9]* (Title|Link|Description)*
    
  • Each report line can either print out a list of ALL items in the feed, or all items from the last DAY or the last WEEK, or a specific feed item (counting from 0)

  • For each feed item you can print out the Title, Link & Description or any combination.

Please see the report.template provided in this repository for a more complete example.

You can also download images (or technically any other file) using a file called images in the directory (see the pt-reyes folder) - this is useful if you want to download specific data (like satellite images) from a resource that updates fairly often.

Notes

You will need to be running a local tor proxy on port 9050

There is very little in the way of graceful error handling, contributions appreciated, please also feel free to submit issues & feature requests.