Rockin' RegEx

Introduction

Learn It

  • Linux users in large organisations frequently have to handle large sets of data.
    • A security analyst might be trying to detect unauthorised access to a bank's network by looking through log files containing millions of lines.
    • A software engineer at a car manufacturer might be trying to track a bug and need to pick out specific lines of code from a program contining thousands of lines of code.
    • A teacher might be trying to kill a misbehaving process running on their MacBook, and need to find it.
  • Whenever there is a large amount of data, manually pouring through all of it can be a time-consuming (or even impossible) task where there are more than a few hundred items.
  • To help solve these types of problem, Computer Scientists use Regular Expressions to help pick out just the data they want.
  • Today, we'll learn to use this incredibly powerful feature.

Learn It

  • RexExOne.com have produced a superb interactive learning resource to get to grips with regular expressions.
  • Use the links below to work through exercises 1-4 in a new tab.
  • Note down the answers to the problems, as these form the basis of your badge task for this lesson.
    1. Introduction to regEx
    2. The 123s
    3. The dot
    4. Matching characters
  • When you're done, you should be able to create regular expressions to search for things in most circumstances.

Learn It

  • Lots of programs in Linux support regular expressions.
  • One example is grep.
  • By using these with pipes (previous lesson), a computer user can quickly look through filesystems, log files, code and anything else you want.
Click here for pop-up Linux terminal window
  • Maybe we're like to see all the processes (programs) currently running that contain the characters, pdf.
  • First, try typing ps by itself, to see the 10 or so processes that are running in the virtual Linux terminal.
  • We can pipe the results of the ps command through grep to filter out just the bits we want to see.
  • Try typing:
ps | grep pdf
  • You'll see that only the processes that end pdf are listed.
  • Maybe we'd like to see a list of all the files in our current directory that contain the numbers 1 to 3 in their filename.
  • We'll start by navigating over to where there are some files. Type:
cd /home/moveIt/otherDir/badge
  • Type ls to see the files in the directory. There are quite a few.
  • We can pipe the output from ls through grep to only see files that contain numbers in the range we want.
ls | grep [1-3]
  • Task: Could you write an expression to only show the files ending with jpg.
  • On Linux filesystems, there is a virtual device which constantly generates random characters.
  • You can see some of the stream by typing cat /dev/urandom.
  • NOTE: This will never stop generating content! Use the keyboard shortcut, Ctrl + C to stop the output.
  • It'd be nice to see if the computer ever randomly puts your initials together in one of its lines of output.
  • This is junk data, but could just as easily be a stream of live tweets or status updates on a social network administrator's PC. To let prospective advertisers know how often their brands are mentioned on the network, grep could be used, and the (anonymised) messages sold to help companies market their products.
  • You can try seeing how often your computer's random character generator greets you with hi.
cat /dev/urandom | grep hi
  • Hopefully, you'll have spotted that any program that produces an output (e.g. ls, cat, pwd, cat, ps and others) can all be piped through grep to sift out relevant data.

Badge It

  • Silver: Complete lessons 1-4 on the RexExOne.com site. Write your solutions for each problem into a text file, and upload them to https://www.BourneToLearn.com to collect the badge.
  • Gold: Complete lessons 5-6, and upload your solutions.
  • Platinum: Complete lessons 7-9, and upload your solutions.