libStatGen Software  1
FastQFile Class Reference

Class for reading/validating a fastq file. More...

#include <FastQFile.h>

Collaboration diagram for FastQFile:

Public Member Functions

 FastQFile (int minReadLength=10, int numPrintableErrors=20)
 Constructor. More...
 
void disableMessages ()
 Disable messages - do not write to cout.
 
void enableMessages ()
 Enable messages - write to cout.
 
void disableSeqIDCheck ()
 Disable Unique Sequence ID checking (Unique Sequence ID checking is enabled by default).
 
void enableSeqIDCheck ()
 Enable Unique Sequence ID checking. More...
 
void interleaved ()
 Interleaved.
 
void setMaxErrors (int maxErrors)
 Set the number of errors after which to quit reading/validating a file, defaults to -1. More...
 
FastQStatus::Status openFile (const char *fileName, BaseAsciiMap::SPACE_TYPE spaceType=BaseAsciiMap::UNKNOWN)
 Open a FastQFile. More...
 
FastQStatus::Status closeFile ()
 Close a FastQFile.
 
bool isOpen ()
 Check to see if the file is open.
 
bool isEof ()
 Check to see if the file is at the end of the file.
 
bool keepReadingFile ()
 Returns whether or not to keep reading the file, it stops reading (false) if eof or there is a problem reading the file.
 
FastQStatus::Status validateFastQFile (const String &filename, bool printBaseComp, BaseAsciiMap::SPACE_TYPE spaceType, bool printQualAvg=false)
 Validate the specified fastq file. More...
 
FastQStatus::Status readFastQSequence ()
 Read 1 FastQSequence, validating it.
 

Public Sequence Line variables.

Keep public variables for a sequence's line so they can be accessed without having to do string copies.

String myRawSequence
 
String mySequenceIdLine
 
String mySequenceIdentifier
 
String myPlusLine
 
String myQualityString
 
BaseAsciiMap::SPACE_TYPE getSpaceType ()
 Get the space type used for this file.
 

Detailed Description

Class for reading/validating a fastq file.

Definition at line 29 of file FastQFile.h.

Constructor & Destructor Documentation

◆ FastQFile()

FastQFile::FastQFile ( int  minReadLength = 10,
int  numPrintableErrors = 20 
)

Constructor.

/param minReadLength The minimum length that a base sequence must be for it to be valid.

Parameters
numPrintableErrorsThe maximum number of errors that should be reported in detail before suppressing the errors.

Definition at line 30 of file FastQFile.cpp.

31  : myFile(NULL),
32  myBaseComposition(),
33  myQualPerCycle(),
34  myCountPerCycle(),
35  myCheckSeqID(true),
36  myInterleaved(false),
37  myPrevSeqID(""),
38  myMinReadLength(minReadLength),
39  myNumPrintableErrors(numPrintableErrors),
40  myMaxErrors(-1),
41  myDisableMessages(false),
42  myFileProblem(false)
43 {
44  // Reset the member data.
45  reset();
46 }

Member Function Documentation

◆ enableSeqIDCheck()

void FastQFile::enableSeqIDCheck ( )

Enable Unique Sequence ID checking.

(Unique Sequence ID checking is enabled by default).

Definition at line 71 of file FastQFile.cpp.

72 {
73  myCheckSeqID = true;
74 }

◆ openFile()

FastQStatus::Status FastQFile::openFile ( const char *  fileName,
BaseAsciiMap::SPACE_TYPE  spaceType = BaseAsciiMap::UNKNOWN 
)

Open a FastQFile.

Use the specified SPACE_TYPE to determine BASE, COLOR, or UNKNOWN.

Definition at line 92 of file FastQFile.cpp.

94 {
95  // reset the member data.
96  reset();
97 
98  myBaseComposition.resetBaseMapType();
99  myBaseComposition.setBaseMapType(spaceType);
100  myQualPerCycle.clear();
101  myCountPerCycle.clear();
102 
104 
105  // Close the file if there is already one open - checked by close.
106  status = closeFile();
107  if(status == FastQStatus::FASTQ_SUCCESS)
108  {
109  // Successfully closed a previously opened file if there was one.
110 
111  // Open the file
112  myFile = ifopen(fileName, "rt");
113  myFileName = fileName;
114 
115  if(myFile == NULL)
116  {
117  // Failed to open the file.
119  }
120  }
121 
122  if(status != FastQStatus::FASTQ_SUCCESS)
123  {
124  // Failed to open the file.
125  std::string errorMessage = "ERROR: Failed to open file: ";
126  errorMessage += fileName;
127  logMessage(errorMessage.c_str());
128  }
129  return(status);
130 }
IFILE ifopen(const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT)
Open a file with the specified name and mode, using a filename of "-" to indicate stdin/stdout.
Definition: InputFile.h:562
void setBaseMapType(BaseAsciiMap::SPACE_TYPE spaceType)
Set the base map type for this composition.
void resetBaseMapType()
Reset the base map type for this composition.
FastQStatus::Status closeFile()
Close a FastQFile.
Definition: FastQFile.cpp:134
Status
Return value enum for the FastQFile class methods, indicating success or error codes.
Definition: FastQStatus.h:31
@ FASTQ_SUCCESS
indicates method finished successfully.
Definition: FastQStatus.h:32
@ FASTQ_OPEN_ERROR
means the file could not be opened.
Definition: FastQStatus.h:35

References closeFile(), FastQStatus::FASTQ_OPEN_ERROR, FastQStatus::FASTQ_SUCCESS, ifopen(), BaseComposition::resetBaseMapType(), and BaseComposition::setBaseMapType().

Referenced by validateFastQFile().

◆ setMaxErrors()

void FastQFile::setMaxErrors ( int  maxErrors)

Set the number of errors after which to quit reading/validating a file, defaults to -1.

Parameters
maxErrors# of errors before quitting, -1 indicates to not quit until the entire file has been read/validated (default), 0 indicates to quit without reading/validating anything.

Definition at line 85 of file FastQFile.cpp.

86 {
87  myMaxErrors = maxErrors;
88 }

◆ validateFastQFile()

FastQStatus::Status FastQFile::validateFastQFile ( const String filename,
bool  printBaseComp,
BaseAsciiMap::SPACE_TYPE  spaceType,
bool  printQualAvg = false 
)

Validate the specified fastq file.

Parameters
filenamefastq file to be validated.
printBaseCompwhether or not to print the base composition for the file. true means print it, false means do not.
spaceTypethe spaceType to use for validation - BASE_SPACE, COLOR_SPACE, or UNKNOWN (UNKNOWN means to determine the spaceType to validate against from the first character of the first sequence).
printQualAvgwhether or not to print the quality averages for the file. true means to print it, false (default) means do not.
Returns
the fastq validation status, SUCCESS on a successfully validated fastq file.

Definition at line 204 of file FastQFile.cpp.

208 {
209  // Open the fastqfile.
210  if(openFile(filename, spaceType) != FastQStatus::FASTQ_SUCCESS)
211  {
212  // Failed to open the specified file.
214  }
215 
216  // Track the total number of sequences that were validated.
217  int numSequences = 0;
218 
219  // Keep reading the file until there are no more fastq sequences to process
220  // and not configured to quit after a certain number of errors or there
221  // has not yet been that many errors.
222  // Or exit if there is a problem reading the file.
224  while (keepReadingFile() &&
225  ((myMaxErrors == -1) || (myMaxErrors > myNumErrors)))
226  {
227  // Validate one sequence. This call will read all the lines for
228  // one sequence.
229  status = readFastQSequence();
230  if((status == FastQStatus::FASTQ_SUCCESS) || (status == FastQStatus::FASTQ_INVALID))
231  {
232  // Read a sequence and it is either valid or invalid, but
233  // either way, a sequence was read, so increment the sequence count.
234  ++numSequences;
235  }
236  else
237  {
238  // Other error, so break out of processing.
239  break;
240  }
241  }
242 
243  // Report Base Composition Statistics.
244  if(printBaseComp)
245  {
246  myBaseComposition.print();
247  }
248 
249  if(printQualAvg)
250  {
251  printAvgQual();
252  }
253 
254  std::string finishMessage = "Finished processing ";
255  finishMessage += myFileName.c_str();
256  char buffer[100];
257  if(sprintf(buffer,
258  " with %u lines containing %d sequences.",
259  myLineNum, numSequences) > 0)
260  {
261  finishMessage += buffer;
262  logMessage(finishMessage.c_str());
263  }
264  if(sprintf(buffer,
265  "There were a total of %d errors.",
266  myNumErrors) > 0)
267  {
268  logMessage(buffer);
269  }
270 
271  // Close the input file.
272  FastQStatus::Status closeStatus = closeFile();
273 
274  if((status != FastQStatus::FASTQ_SUCCESS) && (status != FastQStatus::FASTQ_INVALID) &&
276  {
277  // Stopped validating due to some error other than invalid, so
278  // return that error.
279  return(status);
280  }
281  else if(myNumErrors == 0)
282  {
283  // No errors, check to see if there were any sequences.
284  // Finished processing all of the sequences in the file.
285  // If there are no sequences, report an error.
286  if(numSequences == 0)
287  {
288  // Empty file, return error.
289  logMessage("ERROR: No FastQSequences in the file.");
291  }
293  }
294  else
295  {
296  // The file is invalid. But check the close status. If the close
297  // failed, it means there is a problem with the file itself not just
298  // with validation, so the close failure should be returned.
299  if(closeStatus != FastQStatus::FASTQ_SUCCESS)
300  {
301  return(closeStatus);
302  }
304  }
305 }
void print()
Print the composition.
FastQStatus::Status openFile(const char *fileName, BaseAsciiMap::SPACE_TYPE spaceType=BaseAsciiMap::UNKNOWN)
Open a FastQFile.
Definition: FastQFile.cpp:92
FastQStatus::Status readFastQSequence()
Read 1 FastQSequence, validating it.
Definition: FastQFile.cpp:309
bool keepReadingFile()
Returns whether or not to keep reading the file, it stops reading (false) if eof or there is a proble...
Definition: FastQFile.cpp:193
@ FASTQ_INVALID
means that the sequence was invalid.
Definition: FastQStatus.h:33
@ FASTQ_NO_SEQUENCE_ERROR
means there were no errors, but no sequences read.
Definition: FastQStatus.h:38

References closeFile(), FastQStatus::FASTQ_INVALID, FastQStatus::FASTQ_NO_SEQUENCE_ERROR, FastQStatus::FASTQ_OPEN_ERROR, FastQStatus::FASTQ_SUCCESS, keepReadingFile(), openFile(), BaseComposition::print(), and readFastQSequence().


The documentation for this class was generated from the following files: