This article will take a look at the robots.txt file for the ModX Content Management System.
As some of you may be aware, robots.txt file is a text file that is a standard for telling spiders/crawlers what areas in a website should not be crawled.
Fortunately, the ModX file structure is kept to a minimum with only 3 top level subdirectories and a few more underneath each:

Right off the bat we can eliminate the install directory from being crawled. In fact, it shouldn't even be in existence because you are to remove it after you install ModX.
Next, the manager directory shouldn't be crawled as there is no reason to. Spiders however though will hit a wall because they need to be authenticated.
So this leaves us with the assets directory.

Any directories that contain code should be disallowed. For example, the snippets, templates, js, modules, and plugin shuold be marked as such. The cache directory should never be crawled because it is for temporary caching. The docs directory shouldnt be spidered because it just contains the GPL license.
If you don't want your media files to be crawled you may also want to disallow the files, flash, and images directories.
# Default modx exclusions
User-agent: *
Disallow: /assets/cache/
Disallow: /assets/docs/
Disallow: /assets/export/
Disallow: /assets/import/
Disallow: /assets/modules/
Disallow: /assets/plugins/
Disallow: /assets/snippets/
Disallow: /install/
Disallow: /manager/
# For sitemaps.xml autodiscovery. Uncomment if you have one:
# Sitemap: http://example.com/sitemap.xml