Want to write a new test? (as opposed to an Integration with an editor or CI system)
Some familiarity with Haskell helps. Most checks just use pattern
matching and function calls. Grokking monads is generally not required,
but do
notation may come in handy.
Feel free to skip ahead to ShellCheck in practice.
The ShellCheck wiki can be edited by anyone with a GitHub account.
Feel free to update it with special cases and additional information. If
you are making a significant edit and would like someone to double check
it, you can file an issue with the title
[Wiki] Updated SC1234 to ...
(and point to this paragraph
since this suggestion is still new).
Here's the basic flow of code through ShellCheck:
Of these, AST analysis is the most relevant, and where most of the interesting checks happen.
The parser turns a string into an AST and zero or more warnings.
Parser warnings come in two flavors: problems and notes.
Notes are only emitted when parsing succeeds (they are stored in the
Parsec user state). For example, a note is emitted when adding spaces
around =
in assignments, because if the parser later fails
(i.e. it's not actually an assignment), we want to discard the
suggestion:
|| hasRightSpace) $
when (hasLeftSpace ErrorC 1068 "Don't put spaces around the = in assignments." parseNoteAt pos
On the other hand, problems are always emitted, even when parsing
fails (they are stored in a StateT
higher than Parsec in
the transformer stack). For example, a problem is emitted if there's an
unescaped linefeed in a [ .. ]
expression, because the
statement is likely malformed or unterminated, and we want to show this
warning even if we're unable to parse the whole thing:
&& '\n' `elem` space) $
when (single ErrorC 1080 "When breaking lines in [ ], you need \\ before the linefeed." parseProblemAt pos
So basically, notes are emitted for non-fatal warnings while problems are emitted for fatal ones.
There's a distinction because often you can emit useful information even when parsing fails (suggestions for how to fix it). Likewise, there's often issues that only make sense in context, and shouldn't be emitted if the result does not end up being used. There are probably better solutions for this.
Here are the full types of the parser:
-- v-- Read real/mocked files v-- Stores parse problems
type SCBase m = Mr.ReaderT (SystemInterface m) (Ms.StateT SystemState m)
type SCParser m v = ParsecT String UserState (SCBase m) v
-- ^-- Stores parse notes and token offsets
AST analysis comes in two primary flavors: checks that run on the
root node (sometimes called "tree checks"), and checks that run
on every node (sometimes called "node checks"). Due to poor
planning, these can't be distinguished by type because they both just
take a Token
parameter.
Here's a simple check designed to run on each node, using pattern matching to find backticks:
T_Backticked id list) | not (null list) =
checkBackticks _ (id 2006 "Use $(..) instead of legacy `..`."
style = return () checkBackticks _ _
A lot of checks are just like this, though usually with a bit more matching logic.
Each check is preceded by some mostly self-explanatory unit tests:
= verify checkBackticks "echo `foo`"
prop_checkBackticks1 = verifyNot checkBackticks "echo $(foo)"
prop_checkBackticks2 = verifyNot checkBackticks "echo `#inlined comment` foo" prop_checkBackticks3
There are a few specialized test types for efficiency reasons.
For example, many tests trigger only for certain commands. This could
be done by N tests like the above, each matching command nodes and
checking that the command name applies (N node patches, N command name
extractions, N comparisons). It's more efficient to just have 1 node
match, 1 name extraction, and then a map lookup to find one or more
command handlers. Such checks just register to handle a command name,
and can be found in Checks/Command.hs
.
Similarly, some checks only trigger for a certain shell. This could
be done by N tree checks that optionally iterate the tree, or N node
checks that match a node and skip emitting for certain shells, but it's
more efficient to iterate the tree once with all applicable checks. Such
checks just register to handle nodes for a certain shell, and can be
found in Checks/ShellSupport.hs
.
ShellCheck has multiple output formatters. These take parsing results and outputs them as JSON, XML or human-readable output. They rarely need tweaking. Anyone looking for a different output format should consider transforming one of the existing ones (with XSLT, Python, etc) instead of writing a new formatter.
Let's say that we have a pet peeve: people who use tmp
as a temporary filename. We want to warn about statements like
sort file > tmp && mv tmp file
, and suggest
using mktemp
instead.
To get started, clone the ShellCheck repository and run
cabal repl
followed by :load ShellCheck.Debug
.
This is a development module that offers access to a number of
convenient methods, helpfully listed in Debug.hs:
*ShellCheck.AST> :load ShellCheck.Debug
[...]
[16 of 19] Compiling ShellCheck.Analytics ( src/ShellCheck/Analytics.hs, interpreted )
[17 of 19] Compiling ShellCheck.Analyzer ( src/ShellCheck/Analyzer.hs, interpreted )
[18 of 19] Compiling ShellCheck.Checker ( src/ShellCheck/Checker.hs, interpreted )
[19 of 19] Compiling ShellCheck.Debug ( src/ShellCheck/Debug.hs, interpreted )
Ok, 19 modules loaded.
*ShellCheck.Debug>
Now we can look at the AST for our command:
*ShellCheck.Debug> stringToAst "sort file > tmp"
OuterToken (Id 1) (Inner_T_Annotation [] (OuterToken (Id 15) (Inner_T_Script (OuterToken (Id 0) (Inner_T_Literal "")) [OuterToken (Id 14) (Inner_T_Pipeline [] [OuterToken (Id 12) (Inner_T_Redirecting [OuterToken (Id 11) (Inner_T_FdRedirect "" (OuterToken (Id 10) (Inner_T_IoFile (OuterToken (Id 7) Inner_T_Greater) (OuterToken (Id 9) (Inner_T_NormalWord [OuterToken (Id 8) (Inner_T_Literal "tmp")])))))] (OuterToken (Id 13) (Inner_T_SimpleCommand [] [OuterToken (Id 4) (Inner_T_NormalWord [OuterToken (Id 3) (Inner_T_Literal "sort")]),OuterToken (Id 6) (Inner_T_NormalWord [OuterToken (Id 5) (Inner_T_Literal "file")])])))])])))
(The AST node T_Literal id str
is an alias for
OuterToken (Id id) (Inner_T_Literal str)
. GHC outputs the
latter, unfortunately making it a bit difficult to read. However, with
some effort we can see the part we're interested in:
OuterToken (Id 10) (Inner_T_IoFile (OuterToken (Id 7) Inner_T_Greater) (OuterToken (Id 9) (Inner_T_NormalWord [OuterToken (Id 8) (Inner_T_Literal "tmp")])))) (
This would be equivalent to: (TODO: find a way to format it this way automatically)
T_IoFile (Id 10) (T_Greater (Id 7)) (T_NormalWord (Id 9) [T_Literal (Id 8) "tmp"])) (
We can compare this with the definition in AST.hs
:
-- v-- Redirection operator (T_Greater)
| T_IoFile Id Token Token
-- ^-- Filename (T_NormalWord)
Let's just add a check to Analytics.hs
:
=
checkTmpFilename _ token case token of
T_IoFile id operator filename ->
id 9999 $ "We found this node: " ++ (show token)
warn -> return () _
and then append checkTmpFilename
to the list of node
checks at the top of the file:
nodeChecks :: [Parameters -> Token -> Writer [TokenComment] ()]
= [
nodeChecks
checkUuoc
,checkPipePitfalls
,checkForInQuoted...
-- Here
,checkTmpFilename ]
We can now quick-reload the files with :r
, and use
ShellCheck.Debug's shellcheckString
to run all of
ShellCheck (minus output formatters):
*ShellCheck.Debug> :r
[...]
[17 of 19] Compiling ShellCheck.Analyzer ( src/ShellCheck/Analyzer.hs, interpreted )
[18 of 19] Compiling ShellCheck.Checker ( src/ShellCheck/Checker.hs, interpreted )
[19 of 19] Compiling ShellCheck.Debug ( src/ShellCheck/Debug.hs, interpreted )
*ShellCheck.Debug> shellcheckString "sort file > tmp"
CheckResult {crFilename = "", crComments = [PositionedComment {pcStartPos = Position {posFile = "", posLine = 1, posColumn = 1}, pcEndPos = Position {posFile = "", posLine = 1, posColumn = 1}, pcComment = Comment {cSeverity = ErrorC, cCode = 9999, cMessage = "We found this node: (OuterToken (Id 10) (Inner_T_IoFile (OuterToken (Id 7) Inner_T_Greater) (OuterToken (Id 9) (Inner_T_NormalWord [OuterToken (Id 8) (Inner_T_Literal "tmp")]))))"}, pcFix = Nothing}]}
Or alternatively build and run to see the check apply as it would
when invoking shellcheck
:
cabal run shellcheck - <<< "sort file > tmp"
Alternatively, we can run it in interpreted mode, which is almost as
quick as :r
:
./quickrun - <<< "sort file > tmp"
In either case, our warning now shows up:
In - line 1:
sort file > tmp
^-- SC2148: Tips depend on target shell and yours is unknown. Add a shebang.
^-- SC9999: We found this node: (OuterToken (Id 10) (Inner_T_IoFile (OuterToken (Id 7) Inner_T_Greater) (OuterToken (Id 9) (Inner_T_NormalWord [OuterToken (Id 8) (Inner_T_Literal "tmp")]))))
Now we can flesh out the check. See ASTLib.hs
and
AnalyzerLib.hs
for convenient functions to work with AST
nodes, such as getting the name of an invoked command, getting a list of
flags using canonical flag parsing rules, or in this case, getting the
literal string of a T_NormalWord
so that it doesn't matter
if we use > 'tmp'
, > "tmp"
or
> "t"'m'p
:
=
checkTmpFilename _ token case token of
T_IoFile id operator filename ->
== Just "tmp") $
when (getLiteralString filename 9999 $ "Please use mktemp instead of the filename 'tmp'."
warn (getId filename) -> return () _
We can also prepend a few unit tests that will automatically be
picked up if they start with prop_
:
= verify checkTmpFilename "sort file > tmp"
prop_checkTmpFilename1 = verifyNot checkTmpFilename "sort file > $tmp" prop_checkTmpFilename2
We can run these tests with cabal test
, or in
interpreted mode with ./quicktest
. If the command exits
with success, it's good to go.
If we wanted to submit this test, we could run
./nextnumber
which will output the next unused SC2xxx code,
e.g. 2213 as of writing.
We now have a completely functional test, yay!
For any questions like "How do I turn a X into a Y?" like "shell string into an AST" or "AST into a CFG" or "AST/CFG/DFA into a GraphViz representation", see Debug.hs. It's very readable, and includes additional useful development information.
You can also find the ShellCheck author (me) on IRC as
koala_man
in #haskell@libera.chat
ShellCheck is a static analysis tool for shell scripts. This page is part of its documentation.