Solr/Lucene on EC2/EBS

WARNING: To get this running quickly, I'm disabling security settings - don't use this in a public service.
  1. AWS Console (Instances): "Launch Instance" - ami-b51ff8dc (latest alestic-64 jaunty) m1.large.
  2. AWS Console (Volumes): "Create Volume" - 50GB EBS in same zone as the AMI instance, "Attach Volume" to AMI instance at /dev/sdf.
  3. Copy "Public DNS" address for the AMI from the AWS Console (referred to as {PUBLIC-DNS.amazonaws.com} from now on).
  4. Connect using SSH to root@{PUBLIC-DNS.amazonaws.com}.
  5. aptitude update
    aptitude safe-upgrade
    aptitude install xfsprogs tomcat6 apache2 libxpp3-java
  6. Format the EBS block:
    mkfs.xfs /dev/sdf
    echo "/dev/sdf /solr xfs noatime 0 0" >> /etc/fstab
    mkdir /solr
    mount /solr
  7. scp your Solr conf folder (containing solr-config.xml and schema.xml) and solr.war into /solr/
  8. Set dataDir in /solr/conf/solrconfig.xml to /solr.
  9. scp your Tomcat context file into /etc/tomcat6/Catalina/localhost/
  10. TOMCAT6_SECURITY=no in /etc/default/tomcat6 *warning: insecure*
  11. /etc/init.d/tomcat6 restart (tail -f /var/log/tomcat6/localhost.*.log -c 5000 for errors).
  12. Add proxy settings to the VirtualHost block of /etc/apache2/sites-enabled/000-default:
    ProxyPass /tomcat http://localhost:8080
    ProxyPassReverse /tomcat http://localhost:8080
  13. a2enmod proxy_http
  14. comment out Deny from all in /etc/apache2/mods-enabled/proxy.conf *warning: insecure*
  15. apache2ctl restart (tail -f /var/log/apache2/error.log for errors)
  16. Test URLs:
    1. http://{PUBLIC-DNS.amazonaws.com}/
    2. http://{PUBLIC-DNS.amazonaws.com}/tomcat
    3. http://{PUBLIC-DNS.amazonaws.com}/tomcat/solr

Once you've posted data to Solr, you can umount /dev/sdf, disconnect the EBS volume and shut down the instance (store an image of it on S3 first, for later use, if you like). The EBS volume - and Lucene index - will remain available for connecting to new instances, and you can create snapshots at any time.